GCCE 2024 - 2024 IEEE 13th Global Conference on Consumer Electronics 119-120 2024年
Obese and overweight individuals are at high risk for chronic diseases such as sleep apnea and diabetes. Therefore, it is necessary to track eating behavior to determine the causes of obesity; however, it is time- and labor-intensive to follow the lives of specific individuals and observe their eating behavior. Thus, a method to automatically monitor eating behavior should be considered. As one approach to monitoring methods, we propose a method for convenient recognition of food category for food intake sounds recorded by microphones (below the ear microphone, throat microphone and acoustic microphone), which is less burdensome to the body and better from the viewpoint of privacy protection. Furthermore, a comparison of MFB and large-scale pre-trained speech models (wav2vec2.0, wavLM, and HuBERT) showed the effectiveness of large-scale pre-trained speech models in the food recognition task.
GCCE 2024 - 2024 IEEE 13th Global Conference on Consumer Electronics 808-810 2024年
To enhance speaker verification for short utterances, we have developed a Same Speaker Identification Deep Neural Network (SSI-DNN). This network identifies whether two utterances are uttered by the same speaker with greater accuracy by focusing on the same texts. In this paper, we extend the detection target of the SSI-DNN from monosyllabic utterances to word utterances to improve the speaker recognition performance. Experimental results showed that the SSI-DNN trained on word utterances achieved an EER of 0.1% to 2.8%. These results indicated that the SSI-DNN outperformed the x-vector-based speaker verification method, which is a representative speaker verification method.
GCCE 2024 - 2024 IEEE 13th Global Conference on Consumer Electronics 141-143 2024年
Hands-free control of shower settings, such as temperature, is highly desirable, enhancing user convenience when both hands are occupied or eyes are closed. In this paper, we propose a speaker-dependent, template-based isolated word recognition system using pre-trained large speech models (LSMs) to realize voice-activated shower control with a single microphone. Specifically, we examine the performance of 3 LSMs (wav2vec2.0, HuBERT, WavLM) as well as conventional MFCC as features. Additionally, we investigate speech enhancement using a Convolutional Recurrent Neural Network (CRN) to improve robustness against shower noise. Our experiments for recognizing 30 words with SNRs ranging from -5 dB to 20 dB demonstrate that HuBERT achieves the highest recognition accuracy (77.8 to 95.6%). CRN, on the other hand, improved recognition accuracy only under -5 dB conditions, but its accuracy was only 80.8%.
GCCE 2024 - 2024 IEEE 13th Global Conference on Consumer Electronics 805-807 2024年
Recent advances in AI technology have brought not only many benefits but also considerable risks due to malicious use of the technology. One key example is spoofing through speech synthesis and voice conversion technologies against speaker verification system. To tackle this challenge, we proposed a two-step matching method as a robust speaker verification, in which a user specifies an emotion to a system in advance, and the user is accepted only when the user speaks with the specified emotion. This previous method reduced the false acceptance rate. However, the false rejection rate increased. To overcome this problem, we propose a novel method that integrates speaker and emotion verification scores in this work. Experiments revealed that the proposed method can reduce the equal error rate compared with that of the conventional method to assign the optimal weight to the speaker and emotional information contained in the speech.
Journal of the Acoustical Society of Japan (E) 15(2) 87-96 1994年
We describe a new real-time isolated word recognizer with improved user interface. The recognizer is designed for an Extension Number Guidance System which looks up and announces an extension number by telephone dialogue with users. To deal with telephone quality speech input which include noise and distortion during transmission over the telephone network, we developed feature extraction and a word detection algorithm. These techniques use wide band-pass filter outputs which are generally employed to decide whether speech is voiced or unvoiced. To achieve a friendly interface, the system can accept user input at any time by using an echo canceler and the new word detection algorithm. Finally, the recognizer is evaluated using a large telephone voice database consisting of more than 500 speakers.
「例文」と「例文の正しい統語解析木」から共起関係を蓄積し、解析への応用を試みている。今回、2つの手法で共起関係の蓄積・利用を行った。一つは、句の中心を成す語(ガバナ)に注目するもので、書換え規則の兄弟節点のガバナに共起関係があると仮定し、このガバナの並びを蓄積する。もう1つの方法は共起関係を人手によって書換え規則に記述する手法で、使用したい共起関係やその引数を自由に定義できる。どちらも正しい統語構造と解析結果を比較することで、出現した共起関係を正の事例・負の事例に分類する。そして、負の事例のみの蓄積例を解析のあい昧性解消に利用できる。蓄積例の増加に伴う解析性能の変化を両手法において測定した。蓄積した共起関係の利用により不適当な木の抑止や正しい木の選択に効果が見られた。This manuscript describes two methods which acquire lexical co-occurrences information and utilize it. These methods obtain co-occurrence relations from each example sentence and the corresponding right syntactic structure of the sentence. First of the two methods treats the governors appearing on sister nodes of the syntactic structure as a co-occurrence. On the other method, co-occurrence relationships are described manually in the rewrinting-rules. Both methods discriminate between the proper appearances of co-occurrence and wrong ones, using the right syntactic structure affixed to the sentence treated. The experiment is conducted for these methods, to observe the performances of the analysis using the stored co-occurrences data.