GCCE 2024 - 2024 IEEE 13th Global Conference on Consumer Electronics 119-120 2024年
Obese and overweight individuals are at high risk for chronic diseases such as sleep apnea and diabetes. Therefore, it is necessary to track eating behavior to determine the causes of obesity; however, it is time- and labor-intensive to follow the lives of specific individuals and observe their eating behavior. Thus, a method to automatically monitor eating behavior should be considered. As one approach to monitoring methods, we propose a method for convenient recognition of food category for food intake sounds recorded by microphones (below the ear microphone, throat microphone and acoustic microphone), which is less burdensome to the body and better from the viewpoint of privacy protection. Furthermore, a comparison of MFB and large-scale pre-trained speech models (wav2vec2.0, wavLM, and HuBERT) showed the effectiveness of large-scale pre-trained speech models in the food recognition task.
GCCE 2024 - 2024 IEEE 13th Global Conference on Consumer Electronics 808-810 2024年
To enhance speaker verification for short utterances, we have developed a Same Speaker Identification Deep Neural Network (SSI-DNN). This network identifies whether two utterances are uttered by the same speaker with greater accuracy by focusing on the same texts. In this paper, we extend the detection target of the SSI-DNN from monosyllabic utterances to word utterances to improve the speaker recognition performance. Experimental results showed that the SSI-DNN trained on word utterances achieved an EER of 0.1% to 2.8%. These results indicated that the SSI-DNN outperformed the x-vector-based speaker verification method, which is a representative speaker verification method.
GCCE 2024 - 2024 IEEE 13th Global Conference on Consumer Electronics 141-143 2024年
Hands-free control of shower settings, such as temperature, is highly desirable, enhancing user convenience when both hands are occupied or eyes are closed. In this paper, we propose a speaker-dependent, template-based isolated word recognition system using pre-trained large speech models (LSMs) to realize voice-activated shower control with a single microphone. Specifically, we examine the performance of 3 LSMs (wav2vec2.0, HuBERT, WavLM) as well as conventional MFCC as features. Additionally, we investigate speech enhancement using a Convolutional Recurrent Neural Network (CRN) to improve robustness against shower noise. Our experiments for recognizing 30 words with SNRs ranging from -5 dB to 20 dB demonstrate that HuBERT achieves the highest recognition accuracy (77.8 to 95.6%). CRN, on the other hand, improved recognition accuracy only under -5 dB conditions, but its accuracy was only 80.8%.
GCCE 2024 - 2024 IEEE 13th Global Conference on Consumer Electronics 805-807 2024年
Recent advances in AI technology have brought not only many benefits but also considerable risks due to malicious use of the technology. One key example is spoofing through speech synthesis and voice conversion technologies against speaker verification system. To tackle this challenge, we proposed a two-step matching method as a robust speaker verification, in which a user specifies an emotion to a system in advance, and the user is accepted only when the user speaks with the specified emotion. This previous method reduced the false acceptance rate. However, the false rejection rate increased. To overcome this problem, we propose a novel method that integrates speaker and emotion verification scores in this work. Experiments revealed that the proposed method can reduce the equal error rate compared with that of the conventional method to assign the optimal weight to the speaker and emotional information contained in the speech.
Proceedings of 5th International Symposium on Chinese Spoken Language Processing (ISCSLP2006), (Lecture Notes in Artificial Intelligence, Vol.4274) 4274 539-+ 2006年
Proceedings of The Fourth International Conference on Information and The Fourth Irish Conference on the Mathematical Foundations of Computer Science and Information Technology 345-348 2006年
Proceedings of The Fourth International Conference on Information and The Fourth Irish Conference on the Mathematical Foundations of Computer Science and Information Technology 395-398 2006年
Proceedings of The Fourth International Conference on Information and The Fourth Irish Conference on the Mathematical Foundations of Computer Science and Information Technology 184-188 2006年
Proceedings of The Fourth International Conference on Information and The Fourth Irish Conference on the Mathematical Foundations of Computer Science and Information Technology 416-419 2006年
現在の音声認識は,実使用環境に存在する雑音などの外的要因により性能劣化を免れない.このため,これまで数々の研究が行われてきた.しかしながら,異なるタスク,異なる評価データが用いられてきたため性能の比較が非常に困難であった.このため,情報処理学会音声言語情報処理研究会の下に雑音下音声認識評価のワーキンググループを2001年10月に組織し、評価用標準コーパス、標準バックエンドの作成、配布を行ってきた。本稿では,本活動の現状と今後の予定,狙いについて述べる.Performance degradation by environmental interference such as noise and reverberation is inevitable for the current state of the art speech recognition. So far there have been many researches to overcome this problem. However, it has been very difficult to know actual improvements and compare those methods since those methods were developed for individual tasks and on different corpus. To overcome these problems, we organized a working group under Information Processing Society of Japan. This paper introduces current activities and a future road-map of a common standardized framework for noisy speech recognition by the working group organized by the authors.
近年の情報処理技術の発達に伴い,情報処理の分野ではあまり取り扱われることの無かった人間の感性をコンピュータで処理する研究が盛んになってきている.擬人化エージェントや感性ロボットが人のように振舞うためには,人間の感性を認識し,自らの感情を表出することが必要である.感情を認識し,表出する感性ロボットには,ifbotなどがある.我々は,感性ロボットに応用するための感情認識技術について研究している.しかし,感情認識の研究は始まったばかりであり,感情認識のために利用できる言語コーパスが少ない.また、そのようなコーパスは人手により作成する必要があるが,感情情報の付与手法やデータのフォーマットなどが統一されておらず,コーパスの構築を行い研究を進めるための環境としては不十分だと考えられる.我々は,感性情報処理の研究のための言語コーパスの作成を支援するシステムの開発を行っている.本稿では感情コーパス作成支援システムの開発概要について述べる.In recent years, computer automation have developed in various types of industries, making research about processing human sensibility more active. Emotion recognition and expression technologies are needed to create anthropomorphic agents and sensibility robots that behave like humans. The "ifbot" is an example of a sensibility robot which expresses emotions and recognizes emotions. However, language corpora for emotion recognition are small because emotion recognition is still in the primitive stage of research. We need to construct emotion corpora manually in order to progress the research efficiently, but there doesn't exist a unified format or methods for constructing such emotion corpora. We are developing a support system for constructing a large emotion corpus. In this paper, we propose a system which supports making a natural language corpus of tagged emotion information and describe the outline of the system development.