SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008 1828-1834 2008年 査読有り
Recently, speech recognition performance has been drastically improved by statistical methods and huge speech databases. Now performance improvement under such realistic environments as noisy conditions is being focused on. Since October 2001, we from the working group of the Information Processing Society in Japan have been working on evaluation methodologies and frameworks for Japanese noisy speech recognition. We have released frameworks including databases and evaluation tools called CENSREC-1 (Corpus and Environment for Noisy Speech RECognition 1; formerly AURORA-2J), CENSREC-2 (in-car connected digits recognition), CENSREC-3 (in-car isolated word recognition), and CENSREC-1-C (voice activity detection under noisy conditions). In this paper, we newly introduce a collection of databases and evaluation tools named CENSREC-4, which is an evaluation framework for distant-talking speech under hands-free conditions. Distant-talking speech recognition is crucial for a hands-free speech interface. Therefore, we measured room impulse responses to investigate reverberant speech recognition. The results of evaluation experiments proved that CENSREC-4 is an effective database suitable for evaluating the new dereverberation method because the traditional dereverberation process had difficulty sufficiently improving the recognition performance. The framework was released in March 2008, and many studies are being conducted with it in Japan.
PROCEEDINGS OF THE SECOND INTERNATIONAL SYMPOSIUM ON UNIVERSAL COMMUNICATION 478-481 2008年 査読有り
In sign language, hand positions and movements represent meaning of words. Hence, we have been developing sign language recognition methods using both of hand positions and movements. However, in the previous studies, each feature has same weight to calculate the probability for the recognition. In this study, we propose a sign language recognition method by using a multi-stream HMM technique to show the importance of position and movement information for the sign language recognition. We conducted recognition experiments using 21,960 sign language word data. As a result, 75.6% recognition accuracy was obtained with the appropriate weight (position:movement=0.2:0.8), while 70.6% was obtained with the same weight. From the result, we can conclude that the hand movement is more important for the sign language recognition than the hand position. In addition, we conducted experiments to discuss the optimal number of the states and mixtures and the best accuracy was obtained by the 15 states and two mixtures for each word HMM.
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5 1929-+ 2008年 査読有り
Recently, some new sensors, such as bone-conductive microphones, throat microphones, and non-audible murmur (NAM) microphones, besides conventional condenser microphones have been developed for collecting speech data. Accordingly, some researchers began to study speaker and speech recognition using speech data collected by these new sensors. We focus on bone-conduction speech data collected by the bone-conductive microphone. This paper proposes a novel speaker identification method which combines "bone-conduction speech" and "air-conduction speech". The proposed method conducts speaker identification by integrating the similarity calculated by air-conduction speech model and similarity calculated by bone-conduction speech model. For evaluating the proposed method, we conduct the speaker identification experiment using part of a large bone-conduction speech corpus constructed by National Research Institute of Police Science, Japan (NRIPS). Experimental results show that the proposed method can reduce a identification error rate of air-conduction speech and bone-conduction speech. Especially, the proposed method achieves that the average error reduction rate from air-conduction speech to the proposed method is 35.8%.
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5 545-+ 2008年 査読有り
In this paper, we describe a speech re-synthesis tool using the fundamental frequency (F0) generation model proposed by Fujisaki et al. and STRAIGHT, designed by Kawahara, which can be used for listening experiments by modifying F0 model parameters. To create the tool, we first established a method for automatically estimating F0 model parameters by using genetic algorithms. Next, we combined the proposed method and STRAIGHT. We can change the prosody of input speech by manually modifying the F0 model parameters with the tool and evaluate the relation between human perception and F0 model parameters. We confirmed the ability of this tool to make natural speech data that have various prosodic parameters.
Proceedings of the IASTED International Conference on Artificial Intelligence and Applications, AIA 2008 121-125 2008年
Our research aims to develop a robot which can make smooth conversation with human beings. Such robot needs to have abilities to understand and interpret words. Currently, a technique based on a large-scale language dictionary or a corpus is predominantly mostly used in the field of the language processing. As quite a lot of costs, resources and time are necessary to create such linguistic capital, automatic construction technique is also being researched. However, the knowledge in a category of common sense is inherent to human and difficult to construct automatically although it is indispensable knowledge for robot to realize conversation with human beings without sense of unease. In this paper, we propose a technique which contributes to semiautomatic construction of a large-scale language dictionary and a corpus. Concretely, a system using the proposed technique indicates a position of an unknown word to be registered in an existing thesaurus dictionary. The proposed technique was able to improve approximately 20% in accuracy compared with the traditional techniques.
IEEJ TRANSACTIONS ON ELECTRICAL AND ELECTRONIC ENGINEERING 3(1) 106-112 2008年1月 査読有り
The research on Chinese-Japanese machine translation has been lasting for many years, and now this research field is increasingly thoroughly refined. In practical machine translation system, the processing of a simple and short Chinese sentence has somewhat good results. However, the translation of complex long Chinese sentence still has difficulties. For example, these systems are still unable to solve the translation problem of complex 'BA' sentences. In this article a new method of parsing of 'BA' sentence for machine translation based on valency theory is proposed. A 'BA' sentence is one that has a prepositional word 'BA'. The structural character of a 'BA' sentence is that the original verb is behind the object word. The object word after the 'BA' preposition is used as an adverbial modifier of an active word. First, a large number of grammar items from Chinese grammar books are collected, and some elementary judgment rules are set by classifying and including the collected grammar items. Then, these judgment rules are put into use in actual Chinese language and are modified by checking their results instantly. Rules are checked and modified by using the statistical information from an actual corpus. Then, a five-segment model used for 'BA' sentence translation is brought forward after the above mentioned analysis. Finally, we applied this proposed model into our developed machine translation system and evaluated the experimental results. It achieved a 91.3% rate of accuracy and the satisfying result verified effectiveness of our five-segment model for 'BA' sentence translation. (C) 2007 Institute of Electrical Engineers of Japan. Published by John Wiley & Sons, Inc.
International Journal for Human Caring Vol.12(No.1) 7-16 2008年1月 査読有り
We presume that the measurement of electroencephalographic (EEG) changes, those activities that are considered physiological indicators, enables an objective understanding of changes in emotions of those who have difficulty in expressing these through facial expression or physical action. Generally, EEG is used in the hospital to examine encephalopathy and brain disorder. Using an electroencephalograph device to acquire digital data we propose a method to objectively capture changes in the recognition state of people from changes in EEG activities (action potential), and a way to apply it into a clinical situation.
INFORMATION-AN INTERNATIONAL INTERDISCIPLINARY JOURNAL 11(1) 55-68 2008年1月 査読有り
We aim to develop the mechanism that is possible to sympathize with man, and we target at the thing to read feelings which man feels from the brain wave. This time, as an initial stage of the research how the subject is able to make judgments to be impressed from the brain wave is verified. Concretely, it is investigated by using electroencephalograph (EEG) that the brain is an active state when the subject own declaring to have been impressed. Three kinds of evaluation method are used for this research. One method is statistically evaluated based on the strength of potential. Other method is evaluated objective based on the place which brain waves activate. Another method is evaluated by comparing a subject's subjectivity with change of EEG. Subjects are two persons and a small number this time, and since those attributes are partial, a question remains in the justification of a result. However, it is also the fact which becomes clear from this result that a subject's impression condition can fully be judged from the activity state of brain waves.
Research in Computing Science Vol.32 330-340 2007年11月 査読有り
There have been some studies about spoken natural language dialog, and most of them have successfully been developed within the specified domains. However, current human-computer interfaces only get the data to process their programs. Aiming at developing an affective dialog system, we have been exploring how to incorporate emotional aspects of dialog into existing dialog processing techniques. As a preliminary step toward this goal, we work on making a Chinese emotion classification model which is used to recognize the main affective attribute from a sentence or a text. Finally we have done experiments to evaluate our model.
Peilin Jiang, Ran Li, Fuji Ren, Shingo Kuroiwa, Nanning Zheng
Research in Computing Science Vol.32 374-381 2007年11月 査読有り
The Human Computer Interface Technology has faced challenges of understanding user's mind actively. In the  ̄rst, the speak detection is a primary technique in applications of human computer interface(HCI) and other applications like surveillance system, video conferenceand multimedia data base management in computer vision and speechrecognition. This paper describes a novel method to detect speaker witha probabilistic model of behavior of speaking. After human face recognition, the especial components under the nonlinear transformation incolor space of lip represent the speci ̄c mouth region and then combine the groups of coherent motions . Next the simple movements in themouth region are modeled by hidden Markov models. The experimentalresults demonstrate that the model representing speaking is e±ciencyand successful in applying to driver video surveillance system.
Mohamed Abdel Fattah, David B. Bracewell, Fuji Ren, Shingo Kuroiwa
COMPUTER SPEECH AND LANGUAGE 21(4) 594-608 2007年10月 査読有り
Parallel corpora have become an essential resource for work in multilingual natural language processing. However, sentence aligned parallel corpora are more efficient than non-aligned parallel corpora for cross-language information retrieval and machine translation applications. In this paper, we present two new approaches to align English-Arabic sentences in bilingual parallel corpora based on probabilistic neural network (P-NNT) and Gaussian mixture model (GMM) classifiers. A feature vector is extracted from the text pair under consideration. This vector contains text features such as-length, punctuation score, and cognate score values. A set of manually prepared training data was assigned to train the probabilistic neural network and Gaussian mixture model. Another set of data was used for testing. Using the probabilistic neural network and Gaussian mixture model approaches, we could achieve error reduction of 27% and 50%, respectively, over the length based approach when applied on a set of parallel English-Arabic documents. In addition, the results of (P-NNT) and (GMM) outperform the results of the combined model which exploits length, punctuation and cognates in a dynamic framework. The GMM approach outperforms Melamed and Moore's approaches too. Moreover these new approaches are valid for any languages pair and are quite flexible since the feature vector may contain more, less or different features, such as a lexical matching feature and Hanzi characters in Japanese-Chinese texts, than the ones used in the current research. (c) 2007 Elsevier Ltd. All rights reserved.
SNPD 2007: EIGHTH ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING, AND PARALLEL/DISTRIBUTED COMPUTING, VOL 2, PROCEEDINGS 574-+ 2007年 査読有り
The rapid growth of the Internet has resulted in enormous amounts of information that has become more difficult to access efficiently. The primary goal of this research is to create an efficient tool that is able to summarize large documents automatically. We propose concept chains to link semantically-related concepts based on Hownet knowledge database to improve the performance of Text Summarization and suit Chinese text. Lexical chains is a technique for identifying semantically-related terms in text. The resulting concept chains are then used to identify candidate sentences useful for extraction. Moreover, the other method based on structural features which can makes the summary of the text have more general content and more balance is also proposed. The final experimental results proved the effectiveness of our methods.
MICAI 2007: ADVANCES IN ARTIFICIAL INTELLIGENCE 4827 1046-+ 2007年 査読有り
The Human Computer Interaction (HCI) Technology has emerged in the different fields in applications in computer vision and recognition systems such as virtual environment, video games, e-business and multimedia management. In this paper we propose a framework of designing the Mental State Transition (MST) of a human being or virtual character. The expressions of human emotion can be easily remarked by facial expressions, gestures, sound and other visual characteristics. But the potential MST modeling in affective data are always hidden actually. We analysis the framework of MST and employ DBNs to construct the MST networks and finally the experiment has been implemented to derive the ground truth of the data and verify the effectiveness.
MICAI 2007: ADVANCES IN ARTIFICIAL INTELLIGENCE 4827 1035-+ 2007年 査読有り
Emotion recognition aims to make computer understand ambiguous information of human emotion. Recently, research of emotion recognition is actively progressing in various fields such as natural language processing, speech signal processing, image data processing or brain wave analysis. We propose a method to recognize emotion in dialogue text by using originally created Emotion Word Dictionary. The words in the dictionary are weighted according to the occurrence rates in the existing emotion expression dictionary. We also propose a method to judge the object of emotion and emotion expressivity in dialogue sentences. The experiment using 1,190 sentences proved about 80% accuracy.
INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4 1045-1048 2007年 査読有り
In this paper, we propose a non-realtime speech bandwidth extension method using HMM-based speech recognition and HMM-based speech synthesis. In the proposed method, first, the phoneme-state sequence is estimated from the bandlimited speech signals using the speech recognition technique. Next, for estimating spectrum envelopes of lost high-frequency components, an HMM-based speech synthesis technique generates a synthetic speech signal (spectrum sequence) according to the predicted phoneme-state sequence. Since both speech recognition and speech synthesis take into account dynamic feature vectors, we can obtain a smoothly varying spectrum sequence. For evaluating the proposed method, we conducted subjective and objective experiments. The experimental results show the effectiveness of the proposed method for bandwidth extension. However, the proposed method needs more improvement in speech quality.
2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2 607-+ 2007年 査読有り
Voice activity detection (VAD) plays an important role in speech processing including speech recognition, speech enhancement, and speech coding in noisy environments. We developed an evaluation framework for VAD in such environments, called Corpus and Environment for Noisy Speech Recognition 1 Concatenated (CENSREC-1-C). This framework consists of noisy continuous digit utterances and evaluation tools for VAD results. By adoptiong two evaluation measures, one for frame-level detection performance and the other for utterance-level detection performance, we provide the evaluation results of a power-based VAD method as a baseline. When using VAD in speech recognizer, the detected speech segments are extended to avoid the loss of speech frames and the pause segments are then absorbed by a pause model. We investigate the balance of an explicit segmentation by VAD and an implicit segmentation by a pause model using an experimental simulation of segment extension and show that a small extension improves speech recognition.
PROCEEDINGS OF THE 2007 IEEE INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING (NLP-KE'07) 361-+ 2007年 査読有り
This paper constructs a question answering system of Confucian Analects. As a result of context change and the difference of words' connotation between modem Chinese and ancient Chinese, the accuracy of content-based retrieval and category-based retrieval in the classical literature is quite low. In view of this, the paper has established the categories and pragmatics information base for Confucian Analects. It also proposes a retrieval method based on pragmatics information and categories. To increase accuracy and efficiency, the category keyword collection and the question type keyword table are established as well. When the system recognize the type and category of the user's question, it uses key word semantic matching. Namely, the category keyword collection and the question type keyword table are separately used to decide the category and the type. The experiments evidenced the effectiveness of answer extraction approach based on pragmatics information specific to in query with deep meaning.
INTERNATIONAL JOURNAL OF NEURAL SYSTEMS 16(6) 423-434 2006年12月 査読有り
Parallel corpora have become an essential resource for work in multi lingual natural language processing. However, sentence aligned parallel corpora are more efficient than non-aligned parallel corpora for cross language information retrieval and machine translation applications. In this paper, we present a new approach to align sentences in bilingual parallel corpora based on feed forward neural network classifier. A feature parameter vector is extracted from the text pair under consideration. This vector contains text features such as length, punctuate score, and cognate score values. A set of manually prepared training data has been assigned to train the feed forward neural network. Another set of data was used for testing. Using this new approach, we could achieve an error reduction of 60% over length based approach when applied on English-Arabic parallel documents. Moreover this new approach is valid for any language pair and it is quite flexible approach since the feature parameter vector may contain more/less or different features than that we used in our system such as lexical match feature.
WSEAS Transactions on Computers 5(9) 1880-1885 2006年9月
WordNet has become a standard tool in the NLP researcher's toolkit. While giving a plethora of information it does lack certain information that would be of great benefit. This paper examines building frames of knowledge for a subset of causal agents in WordNet. This extra knowledge can help in Question &
Answering, Machine Translation, etc. After an examination of the WordNet glosses different classes were created that allow for obtaining knowledge about actions, attributes, and domains.
INFORMATION PROCESSING & MANAGEMENT 42(4) 1003-1016 2006年7月 査読有り
Arabic is a morphologically rich language that presents significant challenges to many natural language processing applications because a word often conveys complex meanings decomposable into several morphemes (i.e. prefix, stem, suffix). By segmenting words into morphemes, we could improve the performance of English/Arabic translation pair's extraction from parallel texts. This paper describes two algorithms and their combination to automatically extract an English/Arabic bilingual dictionary from parallel texts that exist in the Internet archive after using an Arabic light stemmer as a preprocessing step. Before using the Arabic light stemmer, the total system precision and recall were 88.6% and 81.5% respectively, then the system precision an recall increased to 91.6% and 82.6% respectively after applying the Arabic light stemmer on the Arabic documents.
The algorithms have certain variables which values can be changed to control the system precision and recall. Like most of the systems do, the accuracy of our system is directly proportional to the number of sentence pairs used. However our system is able to extract translation pairs from a very small parallel corpus. This new system can extract translations from only two sentences in one language and two sentences in the other language if the requirements of the system accomplished. Moreover, this system is able to extract word pairs that are translation of each others, synonyms and the explanation of the word in the other language as well. By controlling the system variables, we could achieve 100% precision for the output bilingual dictionary with a small recall. (c) 2005 Elsevier Ltd. All rights reserved.
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E89D(6) 1848-1859 2006年6月 査読有り
In this paper, we propose the construction of a web-based Question Answering (QA) system for restricted domain, which combines three resource information databases for the retrieval mechanism, including a Question&Answer database, a special domain documents database and the web resource retrieved by Google search engine. We describe a new retrieval technique of integrating a probabilistic technique based on OkapiBM25 and a semantic analysis which based on the ontology of HowNet knowledge base and a special domain HowNet created for the restricted domain. Furthermore, we provide a method of question expansion by computing word semantic similarity. The system is first developed for a middle-size domain of sightseeing information. The experiments proved the efficiency of our method for restricted domain and it is feasible to transfer to other domains expediently using the proposed method.
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E89D(5) 1712-1719 2006年5月 査読有り
In the European Telecommunication Standards Institute (ETSI), Distributed Speech Recognition (DSR) front-end, the distortion added due to feature compression on the front end side increases the variance flooring effect, which in turn increases the identification error rate. The penalty incurred in reducing the bit rate is the degradation in speaker recognition performance. In this paper, we present a nontraditional solution for the previously mentioned problem. To reduce the bit rate, a speech signal is segmented at the client, and the most effective phonemes (determined according to their type and frequency) for speaker recognition are selected and sent to the server. Speaker recognition occurs at the server. Applying this approach to YOHO corpus, we achieved an identification error rate (ER) of 0.05% using an average segment of 20.4% for a testing utterance in a speaker identification task. We also achieved an equal error rate (EER) of 0.42% using an average segment of 15.1% for a testing utterance in a speaker verification task.
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E89D(3) 1074-1081 2006年3月 査読有り
In this paper, we propose a distributed speaker recognition method using a nonparametric speaker model and Earth Mover's Distance (EMD). In distributed speaker recognition, the quantized feature vectors are sent to a server. The Gaussian mixture model (GMM), the traditional method used for speaker recognition, is trained using the maximum likelihood approach. However, it is difficult to fit continuous density functions to quantized data. To overcome this problem, the proposed method represents each speaker model with a speaker-dependent VQ code histogram designed by registered feature vectors and directly calculates the distance between the histograms of speaker models and testing quantized feature vectors. To measure the distance between each speaker model and testing data, we use EMD which can calculate the distance between histograms with different bins. We conducted text-independent speaker identification experiments using the proposed method. Compared to results using the traditional GMM, the proposed method yielded relative error reductions of 32% for quantized data.
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E89D(3) 1074-1081 2006年3月 査読有り
In this paper, we propose a distributed speaker recognition method using a nonparametric speaker model and Earth Mover's Distance (EMD). In distributed speaker recognition, the quantized feature vectors are sent to a server. The Gaussian mixture model (GMM), the traditional method used for speaker recognition, is trained using the maximum likelihood approach. However, it is difficult to fit continuous density functions to quantized data. To overcome this problem, the proposed method represents each speaker model with a speaker-dependent VQ code histogram designed by registered feature vectors and directly calculates the distance between the histograms of speaker models and testing quantized feature vectors. To measure the distance between each speaker model and testing data, we use EMD which can calculate the distance between histograms with different bins. We conducted text-independent speaker identification experiments using the proposed method. Compared to results using the traditional GMM, the proposed method yielded relative error reductions of 32% for quantized data.
PACLIC 20: PROCEEDINGS OF THE 20TH PACIFIC ASIA CONFERENCE ON LANGUAGE, INFORMATION AND COMPUTATION 426-429 2006年 査読有り
A large amount of on-line information and lengthiness information can't fit for the mobile devices. In order to save this problem, we propose a method which collects original news text from on-line information and extracts summary sentences from them automatically. On this basis, we adopt WML(Wireless Markup Language) to build a news website for mobile devices browsing through the news summary. The system is mainly made up by Automatic News Collection and Auto Text Summarization. Our experimental results proved the effectiveness of the means.
PACLIC 20: PROCEEDINGS OF THE 20TH PACIFIC ASIA CONFERENCE ON LANGUAGE, INFORMATION AND COMPUTATION 370-373 2006年 査読有り
In the present study, we present different approaches for transliteration proper noun pair's extraction from parallel corpora based on different similarity measures between the English and Romanized Arabic proper nouns under consideration. The strength of our new system is that it works well for low-frequency words. We evaluate the presented new approaches using an English-Arabic parallel corpus. Most of our results outperform previously published results in terms of precision, recall and F-Measure.
CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS 4274 539-+ 2006年 査読有り
In this paper, we present the evaluation results of our proposed text-independent speaker recognition method based on the Earth Mover's Distance (EMD) using ISCSLP2006 Chinese speaker recognition evaluation corpus developed by the Chinese Corpus Consortium (CCC). The EMD based speaker recognition (EMD-SR) was originally designed to apply to a distributed speaker identification system, in which the feature vectors are compressed by vector quantization at a terminal and sent to a server that executes a pattern matching process. In this structure, we had to train speaker models using quantized data, so that we utilized a non-pararyletric speaker model and EMD. From the experimental results on a Japanese speech corpus, EMD-SR showed higher robustness to the quantized data than the conventional GMM technique. Moreover, it has achieved higher accuracy than the GMM even if the data were not quantized. Hence, we have taken the challenge of ISCSLP2006 speaker recognition evaluation by using EMD-SR. Since the identification tasks defined in the evaluation were on an open-set basis, we introduce a new speaker verification module in this paper. Evaluation results showed that EMD-SR achieves 99.3% Identification Correctness Rate in a closed-channel speaker identification task.
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5 1105-1108 2006年 査読有り
In recent years, IP telephone service has spread rapidly. However, an unavoidable problem of IP telephone service is deterioration of speech due to packet loss, which often occurs on wireless networks. To overcome this problem, we propose a novel lost speech reconstruction method using speech recognition based on Missing Feature Theory and HMM-based speech synthesis. The proposed method uses linguistic information and can deal with the lack of syllable units which conventional methods are unable to handle. We conducted subjective and objective evaluation experiments under speaker independent conditions. These results showed the effectiveness of the proposed method. Although there is a processing delay in the proposed method, we believe that this method will open up new applications for speech recognition and speech synthesis technology.
Quantifiers and numerals often cause mistakes in Chinese-Japanese machine translation. In this paper, an approach is proposed based on the syntactic features after classification. Using the difference in type and position of quantifiers between Chinese and Japanese, quantifier translation rules were acquired. Evaluation was conducted using the acquired translation rules. Finally, the adaptability of the experimental data was verified and the methods achieved the accuracy of 90.75%, which showed that they were effective in processing quantifiers and numerals.
In this paper, we build a Japanese emotion corpus and perform statistical analysis on it. We manually entered in about 1,200 example dialogue sentences. We collected statistical information from the corpus to analyze the way emotion is expressed in Japanese dialogue. Such statistics should prove useful for dealing with emotion in natural language. We believe the collected statistics accurately describe emotion in Japanese dialogue.
Authors of news stories through their choice in words and phrasing inject an underlying emotion into their stories. A story about the same event or person can have radically different emotions depending on the author, newspaper, and nationality. In this paper we propose a system to judge the emotion of a news article based on emotion word, idiom and modifier dictionaries. This type of system allows one to judge the world opinion on varying topics by looking at the emotion used within news articles about the topic.
The approach of emotion estimation from the conventional text was for estimating superficial emotion expression mainly. However emotions may be included in human's utterance even if emotion expressions are not in it. In this paper, we proposed an emotion estimation algorithm for conversation sentence. We gave the rules of emotion occurrence to 1616 sentence patterns. In addition, we developed a dictionary which consisted of emotional words and emotional idioms. The proposed method can estimate emotions in a sentence by matching the sentence pattern of emotion occurrence and the rule. Furthermore, we can get two or more emotions included in the sentence by calculating emotion parameter. We constructed the experiment system based on the proposed method for evaluation. We analyzed weblog data including 253 sentences by the system, and conducted the experiment to evaluate emotion estimation accuracy. As a result, we obtained the estimation accuracy of about 60%.
In this paper we present a semantic analyzer for aiding emotion recognition in Chinese. The analyzer uses a decision tree to assign semantic dependency relations between headwords and modifiers. It is able to achieve an accuracy of 83.5%. The semantic information is combined with rules for Chinese verbs containing emotion to describe the emotion of the people in the sentence. The rules give information on how to assign emotion to agents, receivers, etc. depending on the verb in the sentence.
In this paper, we present a new approach to align sentences in bilingual parallel corpora based on the use of the linguistic information of the text pair in Gaussian mixture model (GMM) classifier. A feature parameter vector is extracted from the text pair under consideration. This vector contains text features such as length, punctuation score, cognate score and a bilingual lexicon extracted from the parallel corpus under consideration. A set of manually prepared training data has been assigned to train the Gaussian mixture model. Another set of data was used for testing. Using the Gaussian mixture model approach, we could achieve error reduction of 160% over length based approach when applied on English-Arabic parallel documents. In addition, the results of (GMM) outperform the results of the combined model which exploits length, punctuation, cognate and bilingual lexicon in a dynamic framework.
2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13 397-400 2006年 査読有り
In this paper, we describe a Japanese speech corpus collected for investigating the speech variability of a specific speaker over short and long time periods and then report the variability of speech recognition performance over short and long time periods. Although speakers use a speaker-dependent speech recognition system, it is known that speech recognition performance varies pending when the utterance was uttered. This is because speech quality varies by occasion even if the speaker and utterance remain constant. However, the relationships between intra-speaker speech variability and speech recognition performance are not clear. Hence, we have been collecting speech data to investigate these relationships since November 2002. In this paper, we introduce our speech corpus and report speech recognition experiments using our corpus. Experimental results show that the variability of recognition performance over different days is larger than variability of recognition performance within a day.
Jiajun Yan, David B. Bracewell, Fuji Ren, Shingo Kuroiwa
Proceedings of the Nineteenth International Florida Artificial Intelligence Research Society Conference, Melbourne Beach, Florida, USA, May 11-13, 2006 782-786 2006年 査読有り
COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING 3878 458-469 2006年 査読有り
In this paper, we propose a construction of Question Answering(QA) system, which synthesizes the answers retrieval from the frequent asked questions database and documents database, based on special domain about sightseeing information. A speech interface for the special domain was implemented along with the text interface, using an acoustic model HMM, a pronunciation lexicon, and a language model FSN on the basis of the feature of Chinese sentence patterns. We consider the synthetic model based on statistic VSM and shallow language analysis for sightseeing information. Experimental results showed high accuracy can be achieved for the special domain and the speech interface is available for frequently asked questions about sightseeing information.
COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING 3878 97-100 2006年 査読有り
In this paper, we present a new approach to align sentences in bilingual parallel corpora based on a probabilistic neural network (P-NNT) classifier. A feature parameter vector is extracted from the text pair under consideration. This vector contains text features such as length, punctuation score, and cognate score values. A set of manually aligned training data was used to train the probabilistic neural network. Another set of data was used for testing. Using the probabilistic neural network approach, an error reduction of 27% was achieved over the length based approach when applied on English-Arabic parallel documents.