Proceedings of 8th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing 2 574-+ 2007年
雑音下の音声認識の性能向上は音声認識実用化のために急務である。これまでに数多くの研究が行われてきており、これらの手法を客観的に比較評価できる標準評価基盤の構築を目的として、2001年10月、情報処理学会音声言語情報処理研究科の下に雑音下音声認識評価ワーキンググループを組織した。本稿ではこれまでの標準評価基盤CENSRECシリーズを振り返り、今年度新たに配付したCENSREC-1-Cの概要と位置づけを述べる。さらに、今後どのような方針で新たな評価基盤を設計・構築・配付するのかについての考えを述べる。Performance improvement of noisy speech recognition is urgent for practical use of speech recognition. Many researchers have been tryin to overcome this problem. We organized a working group under Special Interest Group of Spoken Language Processing in Information Processing Society of Japan, to develop eveluation frameworks of noisy speech recognition to compare many methods for processing of noisy speech. In this paper, we first review the series of CENSREC (Corppus and Environment of Noisy Speech RECognition) and then introduce the CENSREC-1-C. the newest CENSREC. Finally we describe the road-maps of future CENSRECs.
雑音下における音声認識 音声強調 音声符号化などの音声処理で重要な役割を果たす音声区間検出(Voice Activity Detection;VAD)手法を評価するための基盤としてCENSREC-LCを構築した.これは,雑音下で発声された連続数字音声データとVAD結果の評価を行うツール群からなる.評価方法としては一般的なフレームベースの検出性能評価尺度と音声認識を指向した発話単位の評価尺度を定義した.そして,音声パワーに基づくベースライン手法による川の結果をこれら2つの評価尺度で評価した結果を示した.Voice activity detection (VAD) plays an important role in speech processing including speech recognition, speech enhancement, and speech coding under noisy environment. We developed a evaluation framework for VAD under noisy environments, named CENSREC-1-C. This framework consists of noisy continuous digit utterances and evaluation tools for VAD results. We defined two evaluation measures, one for frame-level detection performance and the other for utterance-level detection perfromance. We showed the evaluation results of a baseline power-based VAD method.
IEICE transactions on information and systems 89(3) 1074-1081 2006年3月1日
In this paper, we propose a distributed speaker recognition method using a nonparametric speaker model and Earth Mover's Distance (EMD). In distributed speaker recognition, the quantized feature vectors are sent to a server. The Gaussian mixture model (GMM), the traditional method used for speaker recognition, is trained using the maximum likelihood approach. However, it is difficult to fit continuous density functions to quantized data. To overcome this problem, the proposed method represents each speaker model with a speaker-dependent VQ code histogram designed by registered feature vectors and directly calculates the distance between the histograms of speaker models and testing quantized feature vectors. To measure the distance between each speaker model and testing data, we use EMD which can calculate the distance between histograms with different bins. We conducted text-independent speaker identification experiments using the proposed method. Compared to results using the traditional GMM, the proposed method yielded relative error reductions of 32% for quantized data.
In recent years, several methods on human emotion recognition have been published. In this paper, we proposed a scheme that applied the emotion classification technique for emotion recognition. The emotion classification model is Support Vector Machines (SVMs). The SVMs have become an increasingly popular tool for machine learning tasks involving classification, regression or novelty detection. The Emotion Recognition System will be recognise emotion from the sentence that was inputted from the keyboard. The training set and testing set were constructed to verify the effect of this model. Experiments showed that this method could achieve better results in practice. The result showed that this method has potential in the emotion recognition field.