GCCE 2024 - 2024 IEEE 13th Global Conference on Consumer Electronics 119-120 2024年
Obese and overweight individuals are at high risk for chronic diseases such as sleep apnea and diabetes. Therefore, it is necessary to track eating behavior to determine the causes of obesity; however, it is time- and labor-intensive to follow the lives of specific individuals and observe their eating behavior. Thus, a method to automatically monitor eating behavior should be considered. As one approach to monitoring methods, we propose a method for convenient recognition of food category for food intake sounds recorded by microphones (below the ear microphone, throat microphone and acoustic microphone), which is less burdensome to the body and better from the viewpoint of privacy protection. Furthermore, a comparison of MFB and large-scale pre-trained speech models (wav2vec2.0, wavLM, and HuBERT) showed the effectiveness of large-scale pre-trained speech models in the food recognition task.
GCCE 2024 - 2024 IEEE 13th Global Conference on Consumer Electronics 808-810 2024年
To enhance speaker verification for short utterances, we have developed a Same Speaker Identification Deep Neural Network (SSI-DNN). This network identifies whether two utterances are uttered by the same speaker with greater accuracy by focusing on the same texts. In this paper, we extend the detection target of the SSI-DNN from monosyllabic utterances to word utterances to improve the speaker recognition performance. Experimental results showed that the SSI-DNN trained on word utterances achieved an EER of 0.1% to 2.8%. These results indicated that the SSI-DNN outperformed the x-vector-based speaker verification method, which is a representative speaker verification method.
GCCE 2024 - 2024 IEEE 13th Global Conference on Consumer Electronics 141-143 2024年
Hands-free control of shower settings, such as temperature, is highly desirable, enhancing user convenience when both hands are occupied or eyes are closed. In this paper, we propose a speaker-dependent, template-based isolated word recognition system using pre-trained large speech models (LSMs) to realize voice-activated shower control with a single microphone. Specifically, we examine the performance of 3 LSMs (wav2vec2.0, HuBERT, WavLM) as well as conventional MFCC as features. Additionally, we investigate speech enhancement using a Convolutional Recurrent Neural Network (CRN) to improve robustness against shower noise. Our experiments for recognizing 30 words with SNRs ranging from -5 dB to 20 dB demonstrate that HuBERT achieves the highest recognition accuracy (77.8 to 95.6%). CRN, on the other hand, improved recognition accuracy only under -5 dB conditions, but its accuracy was only 80.8%.
GCCE 2024 - 2024 IEEE 13th Global Conference on Consumer Electronics 805-807 2024年
Recent advances in AI technology have brought not only many benefits but also considerable risks due to malicious use of the technology. One key example is spoofing through speech synthesis and voice conversion technologies against speaker verification system. To tackle this challenge, we proposed a two-step matching method as a robust speaker verification, in which a user specifies an emotion to a system in advance, and the user is accepted only when the user speaks with the specified emotion. This previous method reduced the false acceptance rate. However, the false rejection rate increased. To overcome this problem, we propose a novel method that integrates speaker and emotion verification scores in this work. Experiments revealed that the proposed method can reduce the equal error rate compared with that of the conventional method to assign the optimal weight to the speaker and emotional information contained in the speech.
情報検索の基本要素となるキーワードは、ドキュメントの探索から記述にわたってあらゆることに使われている。典型的に、キーワード抽出のアルゴリズムでは、キーワード抽出するため、ドキュメントの収集が必要とされる。ドキュメント収集なしのキーワード抽出は重要性を獲得することである。この問題に関しては既に研究されている。しかし、二つの難題が残されている。一つは、キーワードの質は情報検索作業でどれほど機能するかという点に基づいていないのである。もう一つは、キーワードは一つの言語に特定されているのである。本稿では、多言語に適用でき、しかも、有効的にキーワードを抽出できる新しいアルゴリズムを提案した。Keywords are a fundamental part of information retrieval. Keywords are used for everything from searching to describing a document. Typically, algorithms for keyword extraction require a document collection in order to extract keywords. Extracting keywords without a document collection is gaining importance. Research has been done to deal with the problem. However, there are two problems 1) the quality of the keywords was not based on how well they perform in IR tasks and 2) they were designed for only one language. This paper proposes a new algorithm that is applicable to multiple languages and extracts effective keywords.
中日機械翻訳における数量詞の処理は常に誤りを引き起こす.本研究では それらの文法特徴に基づき量詞を分類して処理する方法を提案する.まず 中日対訳コーパスから収集した数量詞の例文を形態素解析して 得られた量詞の種類と数量詞に修飾される名詞の語義特徴を統計して 異なる数量詞と出現する位置の異なりなどにより 機械翻訳における数量詞の翻訳規則を獲得した.翻訳実験システムは2つのモジュールによって構成され 一つはこの数量詞が翻訳するがどうかを確認し.もう一つは この数量詞が翻訳する場合 翻訳形式を選定するのである.得られた翻訳規則を利用して中日数量詞の機械翻訳の評価実験を行った.最後に 実験データの適応性を検証し 提案した方法の有効性を論証した.Quantifiers and numerals often give rise to trouble in Chinese-Japanese machine translation. In this paper, an approach is proposed based on the syntactic features after classification. First, morphological analysis is performed on the sentences extracted from a Chinese-Japanese aligned corpus, which consists of quantifiers and numerals. Next, statistical information is obtained based on the word meaning of the noun that has an accompanying quantifier. Using the difference in quantifier type and position between Chinese and Japanese, quantifier translation rules were acquired. The translation and experiment system is made up of 2 modules. One is to check the quantifier translation and the other is to select the correct translation rule. The evaluation experiment was conducted using the acquired translation rules. Finally, the adaptability of the experimental data is verified and the validity of the proposed method is proven.
日本語の使役表現のX(使役者)がY(被使役者)に/をVさせるにおいて,「させる」が動詞の未然形に下接する.中国語の使役表現はX(使役者)「叫」「?」「使」Y(被使役者)Vのような形で表され,使役詞「叫」「?」「使」と動詞がセットになって「させる」という意味になる.機械翻訳において,中国語の使役表現が「使役詞+動詞」で表現されるのを正しく認識できなければ,日本語に訳す時に大きな障害になる.本論では,教科書及びホームページから大量の実例文を選出し,使役表現および関連情報を抽出し,その情報を分析し,使役表現の特徴などの検討によって,中日機械翻訳における使役表現の翻訳規則を提案する.In Japanese, a causative sentence is expressed as X GA Y NI VSASERU. And Causative sentences in Chinese are expressed as XJIAOVY, XRANGVY or XSHIVY.Chinese Causatives JAIO RANG and SHI are used together with a verb to express SASERU in Japanese. A big obstacle in Chinese-Japanese Machine Translation, is caused by if the causative expression "causative + verb" in Chinese not being recognized. In this research, rules of translating causative expressions in Chinese-Japanese machine translation are proposed by extracting causative expressions and related information from a large amount of examples taken from books and websites and by analyzing and evaluating the features of causative expressions.
SFに基づく機械翻訳はコーパスベースの翻訳手法であり,構文解析や意味解析を必要としない.そのため処理が高速であるという特徴がある.またSFはコーパスから作成されるため訳文が非常に自然である.本研究では,SFを用いた機械翻訳システムをできるだけ多くのユーザに評価してもらうために,翻訳システムをWeb上に構築した.本稿では,構築した翻訳システムの構成について述べると共に,構築の段階で明らかとなった問題について考察する.さらに問題点を解決するための方法について提案する.Super-Function based machine translation is a corpus-based translation method. This method uses Super-Function (SF) to translate without thorough syntactic and semantic analysis as most MT systems do. Therefore translation speed is very fast and translation results are very fluent, because SF is created from a bilingual corpus. In this research, the translation system was built using web technologies in order to have as many users as possible evaluate SF based machine translation. In this paper, we describe the structure of the built translation system and consider the problems which became clear in the process of construction of the system. Furthermore, we propose methods for solving these problems.
構文解析後、分の意味構造を決定するのは重要である。本稿では、Penn Chinese Treebank のために意味的な依存構造を自動的に付与する方法を提案する。まず手動で主辞と意味的依存関係を付与しテストデータを作成する。その後、異なるフィチャーのもとで、二つの教師つき機械学習アルゴリズムをデータに適用し,意味関係を推定する。最後に,中国語の特徴に基づき優先規則を作成し、元コーパスの中に問題がある木構造に対して曖昧性解消を行う。評価実験の結果によると、提案したアルゴリズムが中国語の意味的な依存構造を決定するには有効である。After parsing it is difficult to determine the semantic structure of sentences for Chinese sentences. In this paper, we attempt to automatically annotate the Penn Chinese Treebank with semantic dependency structure. Initially a small portion of the Penn Chinese Treebank was manually annotated with headword and semantic dependency relations. Two supervised machine learning algorithms with varying features were then adopted to learn the relations. Finally, a set of preferences rules were created based on features of Chinese to solve some problem patterns that were found in the Penn Chinese Treebank dealing with ambiguous structures. The experimental results show that the algorithms and proposed approach are effective for determining semantic dependency structure automatically.
In recent years, IP telephone use has spread rapidly thanks to the development of VoIP (Voice overIP) technology. However, an unavoidable problem of the IP telephone is deterioration of speech due to packetloss, which often occurs on the wireless network. To overcome this problem, we propose a novel packet loss concealmentalgorithm using speech recognition and synthesis. This proposed method uses linguistic informationand can deal with the lack of syllable units which conventional methods are unable to handle. We conductedsubjective and objective evaluation experiments. These results showed the effectiveness of the proposed method.Although there is a processing delay in the proposed method, we believe that this method will open up newapplications for speech recognition and speech synthesis technology.
本稿では,SLP雑音下音声認識評価ワーキンググループの活動成果として,自動車内音声認識の評価用データベースCENSREC-3と,標準評価スクリプトによるベースライン評価結果について述べる.CENSREC-3の音声認識タスクは,実走行車内での孤立単語音声認識であり,音声データの収録は,接話マイクロホンと遠隔マイクロホンの2種類を用いて,3種類の走行速度と6種類の車内環境を組み合わせた16種類の環境下で行っている.CENSREC-3では,これら様々な環境したで収録された音声データを用いた6種類の評価環境を提供する.This paper introduces a common database, an evaluation framework, and its baseline recognition results for in-car speech recognition, CENSREC-3, as an outcome of IPSJ-SIG SLP Noisy Speech Recognition Evaluation Working Group. CENSREC-3 which is a sequel of AURORA-2J is designed as the evaluation framework of isolated word recognition in real driving car environments. Speech data was collected using 2 microphones, a close-talking microphone and a hands-free microphone, under carefully controlled 16 different driving conditions, i.e., combinations of 3 car speeds and 5 car conditions. CENSREC-3 provides 6 evaluation environments which are designed using speech data collected in these
In this paper, we propose to use the Simple Principal Component Analysis (SPCA) for dimensionality reduction of the vector space information retrieval model. The SPCA algorithm is a data-oriented fast method which does not require the computation of the variance-covariance matrix. In SPCA, principal components are estimated iteratively so we also propose a criteria to determine the convergence. The optimum number of iterations for each principal component can be determined using the criteria. Experimentally, we show that the SPCA-based method offers improvement over the conventional SVD-based method despite its small amount of computation. This advantage of SPCA can be attributed to its iterative procedure which is similar to clustering methods such as <i>k</i>-means clustering. On the other hand, the proposed method which orthogonalizes the basis vectors also achieved much higher accuracy than the conventional random projection method based on <i>k</i>-means clustering.
In order to meet the demand to acquire necessary information efficiently from large electronic text, the Question and Answering (QA) technology to show a clear reply automatically to a question asked in the user's natural language has widely attracted attention in recent years. Although the research of QA system in China is later than that in western countries and Japan, it has attracted more and more attention recently. In this paper, we propose a Question-Answering construction, which synthesizes the answer retrieval to the questions asked most frequently based on common knowledge, and the document retrieval concerning sightseeing information. In order to improve reply accuracy, one must consider the synthetic model based on statistic VSM and the shallow semantic analysis, and the domain is limited to sightseeing information. A Chinese QA system about sightseeing based on the proposed method has been built. The result is obtained by evaluation experiments, where high accuracy can be achieved when the results of retrieval were regarded as correct, if the correct answer appeared among those of the top three resemblance degree. The experiments proved the efficiency of our method and it is feasible to develop Question-Answering technology based on this method.
This paper reports an evaluation of European Telecommunications Standards Institute (ETSI) standard Distributed Speech Recognition (DSR) front-end through continuous speech recognition on a Japanese speech corpus and proposes methods, the Bias Removal Methods (BRMs), that reduce the distortion between feature parameters and the VQ codebook. Experimental results show that (1) using non-quantized features in an acoustic model training procedure can improve the recognition performance of DSR front-end features and (2) broadening the analysis band can improve the recognition performance for the same bitrate. The proposed method can improve the recognition performance in DSR condition. Notably, we observed an 18% relative improvement in the error rate using the proposed method under mismatch of channel characteristic conditions.
現在の音声認識は.実使用環境に依存する雑音などの外的要因により性能劣化を免れない.このため,これまで数々の研究が行われてきた.しかしながら,異なるタスク,異なる評価データが用いられてきたため性能の比較が非常に困難であった.このため,米国や欧州で種々のプロジェクトが企画された.本稿では,これらのプロジェクトと日本において著者らが進めている雑音下音声認識の評価フレームワーク構築の活動についての現状と今後の予定,狙いについて述べる.Performance degradation by environmental interference such as noise and reverberation is inevitable for the current state of the art speech recognition. So far there have been many researches to overcome this problem. However, it has been very difficult to know actual improvements and compare those methods since those methods were developed for individual tasks and on different corpus. Recently, two projects have been organized in USA and Europe. This paper introduces those projects briefly, and also introduces current activities and a future road-map of a common standardized framework for noisy speech recognition organized by the authors.
本稿では,SLP雑音下音声認識評価ワーキンググループの活動成果として,自動車内音声認識の評価用データベースCENSREC-3と,標準評価スクリプトによるベースライン評価結果について述べる.CENSREC-3は,AURORA-2Jに続く雑音下音声認識の標準評価環境であり,実走行車内での孤立単語音声認識の評価環境を提供する.音声データの収録は,説話マイクロホンと遠隔マイクロホンの2種類を用いて,3種類の走行速度と6種類の車内環境を組み合わせた16種類の環境下で行っており,これらの音声データを用いた6種類の評価環境を提供する.This paper introduces a common database, an evaluation framework, and its baseline recognition results for in-car speech recognition, CENSREC-3, as an outcome of IPSJ-SIG SLP Noisy Speech Recognition Evaluation Working Group. CENSREC-3 which is a sequel of AURORA-2J is designed as the evaluation framework of isolated word recognition in real driving car environments. Speech data was collected using 2 microphones, a close-talking microphone and a hands-free microphone, under carefully controlled 16 different driving conditions, i.e., combinations of 3 car speeds and 5 car conditions. CENSREC-3 provides 6 evaluation environments which are designed using speech data collected in these car conditions.