研究者業績

黒岩 眞吾

クロイワ シンゴ  (Shingo Kuroiwa)

基本情報

所属
千葉大学 大学院工学研究院 教授
学位
博士(電気通信大学大学院電気通信学研究科電子工学専攻)

研究者番号
20333510
J-GLOBAL ID
200901017262764603
researchmap会員ID
1000356498

外部リンク

経歴

 1

論文

 125
  • Masato Nakayama, Takanobu Nishiura, Yuki Denda, Norihide Kitaoka, Kazumasa Yamamoto, Takeshi Yamada, Satoru Tsuge, Chiyomi Miyajima, Masakiyo Fujimoto, Tetsuya Takiguchi, Satoshi Tamura, Tetsuji Ogawa, Shigeki Matsuda, Shingo Kuroiwa, Kazuya Takeda, Satoshi Nakamura
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5 968-+ 2008年  査読有り
  • Shota Sato, Taro Kimura, Yasuo Horiuchi, Masafumi Nishida, Shingo Kuroiwa, Akira Ichikawa
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5 545-+ 2008年  査読有り
    In this paper, we describe a speech re-synthesis tool using the fundamental frequency (F0) generation model proposed by Fujisaki et al. and STRAIGHT, designed by Kawahara, which can be used for listening experiments by modifying F0 model parameters. To create the tool, we first established a method for automatically estimating F0 model parameters by using genetic algorithms. Next, we combined the proposed method and STRAIGHT. We can change the prosody of input speech by manually modifying the F0 model parameters with the tool and evaluate the relation between human perception and F0 model parameters. We confirmed the ability of this tool to make natural speech data that have various prosodic parameters.
  • Junko Minato, David B. Bracewell, Fuji Ren, Shingo Kuroiwa
    Engineering Letters 16(1) 172-177 2008年  査読有り
  • Jiajun Yan, David B. Bracewell, Fuji Ren, Shingo Kuroiwa
    Engineering Letters 16(1) 166-171 2008年  査読有り
  • David B. Bracewell, Fuji Ren, Shingo Kuroiwa
    Engineering Letters 16(1) 160-165 2008年  査読有り
  • Dapeng Yin, Min Shao, Fuji Ren, Shingo Kuroiwa
    IEEJ TRANSACTIONS ON ELECTRICAL AND ELECTRONIC ENGINEERING 3(1) 106-112 2008年1月  査読有り
    The research on Chinese-Japanese machine translation has been lasting for many years, and now this research field is increasingly thoroughly refined. In practical machine translation system, the processing of a simple and short Chinese sentence has somewhat good results. However, the translation of complex long Chinese sentence still has difficulties. For example, these systems are still unable to solve the translation problem of complex 'BA' sentences. In this article a new method of parsing of 'BA' sentence for machine translation based on valency theory is proposed. A 'BA' sentence is one that has a prepositional word 'BA'. The structural character of a 'BA' sentence is that the original verb is behind the object word. The object word after the 'BA' preposition is used as an adverbial modifier of an active word. First, a large number of grammar items from Chinese grammar books are collected, and some elementary judgment rules are set by classifying and including the collected grammar items. Then, these judgment rules are put into use in actual Chinese language and are modified by checking their results instantly. Rules are checked and modified by using the statistical information from an actual corpus. Then, a five-segment model used for 'BA' sentence translation is brought forward after the above mentioned analysis. Finally, we applied this proposed model into our developed machine translation system and evaluated the experimental results. It achieved a 91.3% rate of accuracy and the satisfying result verified effectiveness of our five-segment model for 'BA' sentence translation. (C) 2007 Institute of Electrical Engineers of Japan. Published by John Wiley & Sons, Inc.
  • Kyoko Osaka, Tetsuya Tanioka, Shuichi Ueno, Chiemi Kawanishi, Toshiko Tada, Shingo Kuroiwa, Fuji Ren
    International Journal for Human Caring Vol.12(No.1) 7-16 2008年1月  査読有り
    We presume that the measurement of electroencephalographic (EEG) changes, those activities that are considered physiological indicators, enables an objective understanding of changes in emotions of those who have difficulty in expressing these through facial expression or physical action. Generally, EEG is used in the hospital to examine encephalopathy and brain disorder. Using an electroencephalograph device to acquire digital data we propose a method to objectively capture changes in the recognition state of people from changes in EEG activities (action potential), and a way to apply it into a clinical situation.
  • Kyoko Osaka, Seiji Tsuchiya, Fuji Ren, Shingo Kuroiwa, Tetsuya Tanioka, Rozzano C. Locsin
    INFORMATION-AN INTERNATIONAL INTERDISCIPLINARY JOURNAL 11(1) 55-68 2008年1月  査読有り
    We aim to develop the mechanism that is possible to sympathize with man, and we target at the thing to read feelings which man feels from the brain wave. This time, as an initial stage of the research how the subject is able to make judgments to be impressed from the brain wave is verified. Concretely, it is investigated by using electroencephalograph (EEG) that the brain is an active state when the subject own declaring to have been impressed. Three kinds of evaluation method are used for this research. One method is statistically evaluated based on the strength of potential. Other method is evaluated objective based on the place which brain waves activate. Another method is evaluated by comparing a subject's subjectivity with change of EEG. Subjects are two persons and a small number this time, and since those attributes are partial, a question remains in the justification of a result. However, it is also the fact which becomes clear from this result that a subject's impression condition can fully be judged from the activity state of brain waves.
  • Yu Zhang, Zhuoming Li, Fuji Ren, Shingo Kuroiwa
    Research in Computing Science Vol.32 330-340 2007年11月  査読有り
    There have been some studies about spoken natural language dialog, and most of them have successfully been developed within the specified domains. However, current human-computer interfaces only get the data to process their programs. Aiming at developing an affective dialog system, we have been exploring how to incorporate emotional aspects of dialog into existing dialog processing techniques. As a preliminary step toward this goal, we work on making a Chinese emotion classification model which is used to recognize the main affective attribute from a sentence or a text. Finally we have done experiments to evaluate our model.
  • Peilin Jiang, Ran Li, Fuji Ren, Shingo Kuroiwa, Nanning Zheng
    Research in Computing Science Vol.32 374-381 2007年11月  査読有り
    The Human Computer Interface Technology has faced challenges of understanding user's mind actively. In the  ̄rst, the speak detection is a primary technique in applications of human computer interface(HCI) and other applications like surveillance system, video conferenceand multimedia data base management in computer vision and speechrecognition. This paper describes a novel method to detect speaker witha probabilistic model of behavior of speaking. After human face recognition, the especial components under the nonlinear transformation incolor space of lip represent the speci ̄c mouth region and then combine the groups of coherent motions . Next the simple movements in themouth region are modeled by hidden Markov models. The experimentalresults demonstrate that the model representing speaking is e±ciencyand successful in applying to driver video surveillance system.
  • Mohamed Abdel Fattah, David B. Bracewell, Fuji Ren, Shingo Kuroiwa
    COMPUTER SPEECH AND LANGUAGE 21(4) 594-608 2007年10月  査読有り
    Parallel corpora have become an essential resource for work in multilingual natural language processing. However, sentence aligned parallel corpora are more efficient than non-aligned parallel corpora for cross-language information retrieval and machine translation applications. In this paper, we present two new approaches to align English-Arabic sentences in bilingual parallel corpora based on probabilistic neural network (P-NNT) and Gaussian mixture model (GMM) classifiers. A feature vector is extracted from the text pair under consideration. This vector contains text features such as-length, punctuation score, and cognate score values. A set of manually prepared training data was assigned to train the probabilistic neural network and Gaussian mixture model. Another set of data was used for testing. Using the probabilistic neural network and Gaussian mixture model approaches, we could achieve error reduction of 27% and 50%, respectively, over the length based approach when applied on a set of parallel English-Arabic documents. In addition, the results of (P-NNT) and (GMM) outperform the results of the combined model which exploits length, punctuation and cognates in a dynamic framework. The GMM approach outperforms Melamed and Moore's approaches too. Moreover these new approaches are valid for any languages pair and are quite flexible since the feature vector may contain more, less or different features, such as a lexical matching feature and Hanzi characters in Japanese-Chinese texts, than the ones used in the current research. (c) 2007 Elsevier Ltd. All rights reserved.
  • Jiajun Yan, David B. Bracewell, Shingo Kuroiwa, Fuji Ren
    ACM Transactions on Speech and Language Processing 4(2) 5 2007年5月1日  査読有り
    Semantic analysis is a standard tool in the Natural Language Processing (NLP) toolbox with widespread applications. In this article, we look at tagging part of the Penn Chinese Treebank with semantic dependency. Then we take this tagged data to train a maximum entropy classifier to label the semantic relations between headwords and dependents to perform semantic analysis on Chinese sentences. The classifier was able to achieve an accuracy of over 84%. We then analyze the errors in classification to determine the problems and possible solutions for this type of semantic analysis. © 2007 ACM.
  • Lei Yu, Jia Ma, Fuji Ren, Shingo Kuroiwa
    SNPD 2007: EIGHTH ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING, AND PARALLEL/DISTRIBUTED COMPUTING, VOL 2, PROCEEDINGS 574-+ 2007年  査読有り
    The rapid growth of the Internet has resulted in enormous amounts of information that has become more difficult to access efficiently. The primary goal of this research is to create an efficient tool that is able to summarize large documents automatically. We propose concept chains to link semantically-related concepts based on Hownet knowledge database to improve the performance of Text Summarization and suit Chinese text. Lexical chains is a technique for identifying semantically-related terms in text. The resulting concept chains are then used to identify candidate sentences useful for extraction. Moreover, the other method based on structural features which can makes the summary of the text have more general content and more balance is also proposed. The final experimental results proved the effectiveness of our methods.
  • Peilin Jiang, Hua Xiang, Fuji Ren, Shingo Kuroiwa, Nanning Zheng
    MICAI 2007: ADVANCES IN ARTIFICIAL INTELLIGENCE 4827 1046-+ 2007年  査読有り
    The Human Computer Interaction (HCI) Technology has emerged in the different fields in applications in computer vision and recognition systems such as virtual environment, video games, e-business and multimedia management. In this paper we propose a framework of designing the Mental State Transition (MST) of a human being or virtual character. The expressions of human emotion can be easily remarked by facial expressions, gestures, sound and other visual characteristics. But the potential MST modeling in affective data are always hidden actually. We analysis the framework of MST and employ DBNs to construct the MST networks and finally the experiment has been implemented to derive the ground truth of the data and verify the effectiveness.
  • Kazuyuki Matsumoto, Fuji Ren, Shingo Kuroiwa, Seiji Tsuchiya
    MICAI 2007: ADVANCES IN ARTIFICIAL INTELLIGENCE 4827 1035-+ 2007年  査読有り
    Emotion recognition aims to make computer understand ambiguous information of human emotion. Recently, research of emotion recognition is actively progressing in various fields such as natural language processing, speech signal processing, image data processing or brain wave analysis. We propose a method to recognize emotion in dialogue text by using originally created Emotion Word Dictionary. The words in the dictionary are weighted according to the occurrence rates in the existing emotion expression dictionary. We also propose a method to judge the object of emotion and emotion expressivity in dialogue sentences. The experiment using 1,190 sentences proved about 80% accuracy.
  • Shingo Kuroiwa, Masashi Takashina, Satoru Tsuge, Ren Fuji
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4 1045-1048 2007年  査読有り
    In this paper, we propose a non-realtime speech bandwidth extension method using HMM-based speech recognition and HMM-based speech synthesis. In the proposed method, first, the phoneme-state sequence is estimated from the bandlimited speech signals using the speech recognition technique. Next, for estimating spectrum envelopes of lost high-frequency components, an HMM-based speech synthesis technique generates a synthetic speech signal (spectrum sequence) according to the predicted phoneme-state sequence. Since both speech recognition and speech synthesis take into account dynamic feature vectors, we can obtain a smoothly varying spectrum sequence. For evaluating the proposed method, we conducted subjective and objective experiments. The experimental results show the effectiveness of the proposed method for bandwidth extension. However, the proposed method needs more improvement in speech quality.
  • Norihide Kitaoka, Kazumasa Yamamoto, Tomohiro Kusamizu, Seiichi Nakagawa, Takeshi Yamada, Satoru Tsuge, Chiyomi Miyajima, Takanobu Nishiura, Masato Nakayama, Yuki Denda, Masakiyo Fujimoto, Tetsuya Takiguchi, Satoshi Tamura, Shingo Kuroiwa, Kazuya Takeda, Satoshi Nakamura
    2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2 607-+ 2007年  査読有り
    Voice activity detection (VAD) plays an important role in speech processing including speech recognition, speech enhancement, and speech coding in noisy environments. We developed an evaluation framework for VAD in such environments, called Corpus and Environment for Noisy Speech Recognition 1 Concatenated (CENSREC-1-C). This framework consists of noisy continuous digit utterances and evaluation tools for VAD results. By adoptiong two evaluation measures, one for frame-level detection performance and the other for utterance-level detection performance, we provide the evaluation results of a power-based VAD method as a baseline. When using VAD in speech recognizer, the detected speech segments are extended to avoid the loss of speech frames and the pause segments are then absorbed by a pause model. We investigate the balance of an explicit segmentation by VAD and an implicit segmentation by a pause model using an experimental simulation of segment extension and show that a small extension improves speech recognition.
  • David B. Bracewell, Fuji Ren, Shingo Kuroiwa
    International Conference on Artificial Intelligence and Pattern Recognition, AIPR-07, Orlando, Florida, USA, July 9-12, 2007 22-27 2007年  査読有り
  • Jiajun Yan, David B. Bracewell, Fuji Ren, Shingo Kuroiwa
    International Conference on Artificial Intelligence and Pattern Recognition, AIPR-07, Orlando, Florida, USA, July 9-12, 2007 17-21 2007年  査読有り
  • Shingo Kuroiwa, Satoru Tsuge, Masahiko Kita, Fuji Ren
    IJCLCLP 12(3) 2007年  査読有り
  • Ye Yang, Song Liu, Shingo Kuroiwa, Fuji Ren
    PROCEEDINGS OF THE 2007 IEEE INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING (NLP-KE'07) 361-+ 2007年  査読有り
    This paper constructs a question answering system of Confucian Analects. As a result of context change and the difference of words' connotation between modem Chinese and ancient Chinese, the accuracy of content-based retrieval and category-based retrieval in the classical literature is quite low. In view of this, the paper has established the categories and pragmatics information base for Confucian Analects. It also proposes a retrieval method based on pragmatics information and categories. To increase accuracy and efficiency, the category keyword collection and the question type keyword table are established as well. When the system recognize the type and category of the user's question, it uses key word semantic matching. Namely, the category keyword collection and the question type keyword table are separately used to decide the category and the type. The experiments evidenced the effectiveness of answer extraction approach based on pragmatics information specific to in query with deep meaning.
  • Mohamed Abdel Fattah, Fuji Ren, Shingo Kuroiwa
    INTERNATIONAL JOURNAL OF NEURAL SYSTEMS 16(6) 423-434 2006年12月  査読有り
    Parallel corpora have become an essential resource for work in multi lingual natural language processing. However, sentence aligned parallel corpora are more efficient than non-aligned parallel corpora for cross language information retrieval and machine translation applications. In this paper, we present a new approach to align sentences in bilingual parallel corpora based on feed forward neural network classifier. A feature parameter vector is extracted from the text pair under consideration. This vector contains text features such as length, punctuate score, and cognate score values. A set of manually prepared training data has been assigned to train the feed forward neural network. Another set of data was used for testing. Using this new approach, we could achieve an error reduction of 60% over length based approach when applied on English-Arabic parallel documents. Moreover this new approach is valid for any language pair and it is quite flexible approach since the feature parameter vector may contain more/less or different features than that we used in our system such as lexical match feature.
  • MA Fattah, FJ Ren, S Kuroiwa
    INFORMATION PROCESSING & MANAGEMENT 42(4) 1003-1016 2006年7月  査読有り
    Arabic is a morphologically rich language that presents significant challenges to many natural language processing applications because a word often conveys complex meanings decomposable into several morphemes (i.e. prefix, stem, suffix). By segmenting words into morphemes, we could improve the performance of English/Arabic translation pair's extraction from parallel texts. This paper describes two algorithms and their combination to automatically extract an English/Arabic bilingual dictionary from parallel texts that exist in the Internet archive after using an Arabic light stemmer as a preprocessing step. Before using the Arabic light stemmer, the total system precision and recall were 88.6% and 81.5% respectively, then the system precision an recall increased to 91.6% and 82.6% respectively after applying the Arabic light stemmer on the Arabic documents. The algorithms have certain variables which values can be changed to control the system precision and recall. Like most of the systems do, the accuracy of our system is directly proportional to the number of sentence pairs used. However our system is able to extract translation pairs from a very small parallel corpus. This new system can extract translations from only two sentences in one language and two sentences in the other language if the requirements of the system accomplished. Moreover, this system is able to extract word pairs that are translation of each others, synonyms and the explanation of the word in the other language as well. By controlling the system variables, we could achieve 100% precision for the output bilingual dictionary with a small recall. (c) 2005 Elsevier Ltd. All rights reserved.
  • HQ Hu, PL Jiang, FJ Ren, S Kuroiwa
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E89D(6) 1848-1859 2006年6月  査読有り
    In this paper, we propose the construction of a web-based Question Answering (QA) system for restricted domain, which combines three resource information databases for the retrieval mechanism, including a Question&Answer database, a special domain documents database and the web resource retrieved by Google search engine. We describe a new retrieval technique of integrating a probabilistic technique based on OkapiBM25 and a semantic analysis which based on the ontology of HowNet knowledge base and a special domain HowNet created for the restricted domain. Furthermore, we provide a method of question expansion by computing word semantic similarity. The system is first developed for a middle-size domain of sightseeing information. The experiments proved the efficiency of our method for restricted domain and it is feasible to transfer to other domains expediently using the proposed method.
  • MA Fattah, F Ren, S Kuroiwa
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E89D(5) 1712-1719 2006年5月  査読有り
    In the European Telecommunication Standards Institute (ETSI), Distributed Speech Recognition (DSR) front-end, the distortion added due to feature compression on the front end side increases the variance flooring effect, which in turn increases the identification error rate. The penalty incurred in reducing the bit rate is the degradation in speaker recognition performance. In this paper, we present a nontraditional solution for the previously mentioned problem. To reduce the bit rate, a speech signal is segmented at the client, and the most effective phonemes (determined according to their type and frequency) for speaker recognition are selected and sent to the server. Speaker recognition occurs at the server. Applying this approach to YOHO corpus, we achieved an identification error rate (ER) of 0.05% using an average segment of 20.4% for a testing utterance in a speaker identification task. We also achieved an equal error rate (EER) of 0.42% using an average segment of 15.1% for a testing utterance in a speaker verification task.
  • S Kuroiwa, Y Umeda, S Tsuge, F Ren
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E89D(3) 1074-1081 2006年3月  査読有り
    In this paper, we propose a distributed speaker recognition method using a nonparametric speaker model and Earth Mover's Distance (EMD). In distributed speaker recognition, the quantized feature vectors are sent to a server. The Gaussian mixture model (GMM), the traditional method used for speaker recognition, is trained using the maximum likelihood approach. However, it is difficult to fit continuous density functions to quantized data. To overcome this problem, the proposed method represents each speaker model with a speaker-dependent VQ code histogram designed by registered feature vectors and directly calculates the distance between the histograms of speaker models and testing quantized feature vectors. To measure the distance between each speaker model and testing data, we use EMD which can calculate the distance between histograms with different bins. We conducted text-independent speaker identification experiments using the proposed method. Compared to results using the traditional GMM, the proposed method yielded relative error reductions of 32% for quantized data.
  • S Kuroiwa, Y Umeda, S Tsuge, F Ren
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E89D(3) 1074-1081 2006年3月  査読有り
    In this paper, we propose a distributed speaker recognition method using a nonparametric speaker model and Earth Mover's Distance (EMD). In distributed speaker recognition, the quantized feature vectors are sent to a server. The Gaussian mixture model (GMM), the traditional method used for speaker recognition, is trained using the maximum likelihood approach. However, it is difficult to fit continuous density functions to quantized data. To overcome this problem, the proposed method represents each speaker model with a speaker-dependent VQ code histogram designed by registered feature vectors and directly calculates the distance between the histograms of speaker models and testing quantized feature vectors. To measure the distance between each speaker model and testing data, we use EMD which can calculate the distance between histograms with different bins. We conducted text-independent speaker identification experiments using the proposed method. Compared to results using the traditional GMM, the proposed method yielded relative error reductions of 32% for quantized data.
  • Lei Yu, Mengge Liu, Fuji Ren, Shingo Kuroiwa
    PACLIC 20: PROCEEDINGS OF THE 20TH PACIFIC ASIA CONFERENCE ON LANGUAGE, INFORMATION AND COMPUTATION 426-429 2006年  査読有り
    A large amount of on-line information and lengthiness information can't fit for the mobile devices. In order to save this problem, we propose a method which collects original news text from on-line information and extracts summary sentences from them automatically. On this basis, we adopt WML(Wireless Markup Language) to build a news website for mobile devices browsing through the news summary. The system is mainly made up by Automatic News Collection and Auto Text Summarization. Our experimental results proved the effectiveness of the means.
  • Mohamed Abdel Fattah, Fuji Ren, Shingo Kuroiwa
    PACLIC 20: PROCEEDINGS OF THE 20TH PACIFIC ASIA CONFERENCE ON LANGUAGE, INFORMATION AND COMPUTATION 370-373 2006年  査読有り
    In the present study, we present different approaches for transliteration proper noun pair's extraction from parallel corpora based on different similarity measures between the English and Romanized Arabic proper nouns under consideration. The strength of our new system is that it works well for low-frequency words. We evaluate the presented new approaches using an English-Arabic parallel corpus. Most of our results outperform previously published results in terms of precision, recall and F-Measure.
  • Shingo Kuroiwa, Satoru Tsuge, Masahiko Kita, Fuji Ren
    CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS 4274 539-+ 2006年  査読有り
    In this paper, we present the evaluation results of our proposed text-independent speaker recognition method based on the Earth Mover's Distance (EMD) using ISCSLP2006 Chinese speaker recognition evaluation corpus developed by the Chinese Corpus Consortium (CCC). The EMD based speaker recognition (EMD-SR) was originally designed to apply to a distributed speaker identification system, in which the feature vectors are compressed by vector quantization at a terminal and sent to a server that executes a pattern matching process. In this structure, we had to train speaker models using quantized data, so that we utilized a non-pararyletric speaker model and EMD. From the experimental results on a Japanese speech corpus, EMD-SR showed higher robustness to the quantized data than the conventional GMM technique. Moreover, it has achieved higher accuracy than the GMM even if the data were not quantized. Hence, we have taken the challenge of ISCSLP2006 speaker recognition evaluation by using EMD-SR. Since the identification tasks defined in the evaluation were on an open-set basis, we introduce a new speaker verification module in this paper. Evaluation results showed that EMD-SR achieves 99.3% Identification Correctness Rate in a closed-channel speaker identification task.
  • Shingo Kuroiwa, Satoru Tsuge, Fuji Ren
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5 1105-1108 2006年  査読有り
    In recent years, IP telephone service has spread rapidly. However, an unavoidable problem of IP telephone service is deterioration of speech due to packet loss, which often occurs on wireless networks. To overcome this problem, we propose a novel lost speech reconstruction method using speech recognition based on Missing Feature Theory and HMM-based speech synthesis. The proposed method uses linguistic information and can deal with the lack of syllable units which conventional methods are unable to handle. We conducted subjective and objective evaluation experiments under speaker independent conditions. These results showed the effectiveness of the proposed method. Although there is a processing delay in the proposed method, we believe that this method will open up new applications for speech recognition and speech synthesis technology.
  • Dapeng Yin, Min Shao, Peilin Jiang, Fuji Ren, Shingo Kuroiwa
    COMPUTATIONAL INTELLIGENCE, PT 2, PROCEEDINGS 4114 930-935 2006年  査読有り
    Quantifiers and numerals often cause mistakes in Chinese-Japanese machine translation. In this paper, an approach is proposed based on the syntactic features after classification. Using the difference in type and position of quantifiers between Chinese and Japanese, quantifier translation rules were acquired. Evaluation was conducted using the acquired translation rules. Finally, the adaptability of the experimental data was verified and the methods achieved the accuracy of 90.75%, which showed that they were effective in processing quantifiers and numerals.
  • Junko Minato, David B. Bracewell, Fuji Ren, Shingo Kuroiwa
    COMPUTATIONAL INTELLIGENCE, PT 2, PROCEEDINGS 4114 924-929 2006年  査読有り
    In this paper, we build a Japanese emotion corpus and perform statistical analysis on it. We manually entered in about 1,200 example dialogue sentences. We collected statistical information from the corpus to analyze the way emotion is expressed in Japanese dialogue. Such statistics should prove useful for dealing with emotion in natural language. We believe the collected statistics accurately describe emotion in Japanese dialogue.
  • David B. Bracewell, Junko Minato, Fuji Ren, Shingo Kuroiwa
    COMPUTATIONAL INTELLIGENCE, PT 2, PROCEEDINGS 4114 918-923 2006年  査読有り
    Authors of news stories through their choice in words and phrasing inject an underlying emotion into their stories. A story about the same event or person can have radically different emotions depending on the author, newspaper, and nationality. In this paper we propose a system to judge the emotion of a news article based on emotion word, idiom and modifier dictionaries. This type of system allows one to judge the world opinion on varying topics by looking at the emotion used within news articles about the topic.
  • Kazuyuki Matsumoto, Ren Fuji, Shingo Kuroiwa
    COMPUTATIONAL INTELLIGENCE, PT 2, PROCEEDINGS 4114 902-911 2006年  査読有り
    The approach of emotion estimation from the conventional text was for estimating superficial emotion expression mainly. However emotions may be included in human's utterance even if emotion expressions are not in it. In this paper, we proposed an emotion estimation algorithm for conversation sentence. We gave the rules of emotion occurrence to 1616 sentence patterns. In addition, we developed a dictionary which consisted of emotional words and emotional idioms. The proposed method can estimate emotions in a sentence by matching the sentence pattern of emotion occurrence and the rule. Furthermore, we can get two or more emotions included in the sentence by calculating emotion parameter. We constructed the experiment system based on the proposed method for evaluation. We analyzed weblog data including 253 sentences by the system, and conducted the experiment to evaluate emotion estimation accuracy. As a result, we obtained the estimation accuracy of about 60%.
  • Jiajun Yan, David B. Bracewell, Fuji Ren, Shingo Kuroiwa
    COMPUTATIONAL INTELLIGENCE, PT 2, PROCEEDINGS 4114 893-901 2006年  査読有り
    In this paper we present a semantic analyzer for aiding emotion recognition in Chinese. The analyzer uses a decision tree to assign semantic dependency relations between headwords and modifiers. It is able to achieve an accuracy of 83.5%. The semantic information is combined with rules for Chinese verbs containing emotion to describe the emotion of the people in the sentence. The rules give information on how to assign emotion to agents, receivers, etc. depending on the verb in the sentence.
  • Mohamed Abdel Fattah, Fuji Ren, Shingo Kuroiwa
    COMPUTATIONAL INTELLIGENCE, PT 2, PROCEEDINGS 4114 748-753 2006年  査読有り
    In this paper, we present a new approach to align sentences in bilingual parallel corpora based on the use of the linguistic information of the text pair in Gaussian mixture model (GMM) classifier. A feature parameter vector is extracted from the text pair under consideration. This vector contains text features such as length, punctuation score, cognate score and a bilingual lexicon extracted from the parallel corpus under consideration. A set of manually prepared training data has been assigned to train the Gaussian mixture model. Another set of data was used for testing. Using the Gaussian mixture model approach, we could achieve error reduction of 160% over length based approach when applied on English-Arabic parallel documents. In addition, the results of (GMM) outperform the results of the combined model which exploits length, punctuation, cognate and bilingual lexicon in a dynamic framework.
  • Satoru Tsuge, Masami Shishibori, Kenji Kita, Fuji Ren, Shingo Kuroiwa
    2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13 397-400 2006年  査読有り
    In this paper, we describe a Japanese speech corpus collected for investigating the speech variability of a specific speaker over short and long time periods and then report the variability of speech recognition performance over short and long time periods. Although speakers use a speaker-dependent speech recognition system, it is known that speech recognition performance varies pending when the utterance was uttered. This is because speech quality varies by occasion even if the speaker and utterance remain constant. However, the relationships between intra-speaker speech variability and speech recognition performance are not clear. Hence, we have been collecting speech data to investigate these relationships since November 2002. In this paper, we introduce our speech corpus and report speech recognition experiments using our corpus. Experimental results show that the variability of recognition performance over different days is larger than variability of recognition performance within a day.
  • Jiajun Yan, David B. Bracewell, Fuji Ren, Shingo Kuroiwa
    Proceedings of the Nineteenth International Florida Artificial Intelligence Research Society Conference, Melbourne Beach, Florida, USA, May 11-13, 2006 782-786 2006年  査読有り
  • HQ Hu, FJ Ren, S Kuroiwa, SW Zhang
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING 3878 458-469 2006年  査読有り
    In this paper, we propose a construction of Question Answering(QA) system, which synthesizes the answers retrieval from the frequent asked questions database and documents database, based on special domain about sightseeing information. A speech interface for the special domain was implemented along with the text interface, using an acoustic model HMM, a pronunciation lexicon, and a language model FSN on the basis of the feature of Chinese sentence patterns. We consider the synthetic model based on statistic VSM and shallow language analysis for sightseeing information. Experimental results showed high accuracy can be achieved for the special domain and the speech interface is available for frequently asked questions about sightseeing information.
  • MA Fattah, F Ren, S Kuroiwa
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING 3878 97-100 2006年  査読有り
    In this paper, we present a new approach to align sentences in bilingual parallel corpora based on a probabilistic neural network (P-NNT) classifier. A feature parameter vector is extracted from the text pair under consideration. This vector contains text features such as length, punctuation score, and cognate score values. A set of manually aligned training data was used to train the probabilistic neural network. Another set of data was used for testing. Using the probabilistic neural network approach, an error reduction of 27% was achieved over the length based approach when applied on English-Arabic parallel documents.
  • Mohamed Abdel Fattah, Fuji Ren, Shingo Kuroiwa
    Int. Arab J. Inf. Technol. 3(1) 28-34 2006年  査読有り
  • David B. Bracewell, Fuji Ren, Shingo Kuroiwa
    Engineering Letters 13(2) 216-224 2006年  査読有り
  • M. Fujimoto, S. Nakamura, K. Takeda, S. Kuroiwa, T. Yamada, N. Kitaoka, K. Yamamoto, M. Mizumachi, T. Nishiura, A. Sasou, Miyajima,T. Endo
    Proc. International Workshop on Realworld Multimedia Corpora in Mobile Environment (RWCinME2005) 53-60 2005年4月  査読有り
  • S Nakamura, K Takeda, K Yamamoto, T Yamada, S Kuroiwa, N Kitakoka, T Nishiura, A Sasou, M Mizumachi, C Miyajima, M Fujimoto, T Endo
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E88D(3) 535-544 2005年3月  査読有り
    This paper introduces an evaluation framework for Japanese noisy speech recognition named AURORA-2J. Speech recognition systems must still be improved to be robust to noisy environments, but this improvement requires development of the standard evaluation corpus and assessment technologies. Recently, the Aurora 2, 3 and 4 corpora and their evaluation scenarios have had significant impact on noisy speech recognition research. The AURORA-2J is a Japanese connected digits corpus and its evaluation scripts are designed in the same way as Aurora 2 with the help of European Telecommunications Standards Institute (ETSI) AURORA group. This paper describes the data collection, baseline scripts, and its baseline performance. We also propose a new performance analysis method that considers differences in recognition performance among speakers. This method is based on the word accuracy per speaker, revealing the degree of the individual difference of the recognition performance. We also propose categorization of modifications, applied to the original HTK baseline system, which helps in comparing the systems and in recognizing technologies that improve the performance best within the same category.
  • Shingo Kuroiwa, Yoshiyuki Umeda, Satoru Tsuge, Fuji Ren
    INTERSPEECH 2005 - Eurospeech, 9th European Conference on Speech Communication and Technology, Lisbon, Portugal, September 4-8, 2005 3085-3088 2005年  査読有り
  • Masakiyo Fujimoto, Satoshi Nakamura, Kazuya Takeda, Shingo Kuroiwa, Takeshi Yamada, Norihide Kitaoka, Kazumasa Yamamoto, Mitsunori Mizumachi, Takanobu Nishiura, Akira Sasou, Chiyomi Miyajima, Toshiki Endo
    Proceedings - International Workshop on Biomedical Data Engineering, BMDE2005 2005 1208 2005年  査読有り
    This paper introduces a common database, an evaluation framework, and its baseline recognition results for in-car speech recognition, CENSREC-3, as an outcome of IPSJ-SIG SLP Noisy Speech Recognition Evaluation Working Group. CENSREC-3 which is a sequel of AURORA-2J is designed as the evaluation framework of isolated word recognition in real driving car environments. Speech data was collected using 2 microphones, a close-talking microphone and a hands-free microphone, under carefully controlled 16 different driving conditions, i.e., combinations of 3 car speeds and 5 car conditions. CENSREC-3 provides 6 evaluation environments which are designed using speech data collected in these car conditions. © 2005 IEEE.
  • PL Jiang, H Xiang, F Ren, S Kuroiwa
    EMBEDDED AND UBIQUITOUS COMPUTING - EUC 2005 3824 1026-1035 2005年  査読有り
    The study of human-computer interaction is now the most popular research domain overall computer science and psychology science. The most of essential issues recently focus on not only the information about the physical computing but also the affective computing. The emotion states of human being can dramatically affect their actions. It is important for a computer to understand what the people feel at the time. In this paper, we propose a novel method to predict the future emotion state of person depending on the current emotion state and affective factors by an advanced mental state transition network[l]. The psychological experiment with about 100 participants has been done to obtain the structure and the coefficients of the model. The test experiment also has been done to certificate the prediction validity of this model.
  • T Endo, S Kuroiwa, S Nakamura
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E87D(5) 1119-1126 2004年5月  査読有り
    This paper addresses problems involved in performing speech recognition over mobile and IP networks. The main problem is speech data loss caused by packet loss in the network. We present two missing-feature-based approaches that recover lost regions of speech data. These approaches are based on the reconstruction of missing frames or on marginal distributions. For comparison, we also use a packing method, which skips lost data. We evaluate these approaches with packet loss models. i.e., random loss and Gilbert loss models. The results show that the marginal-distributed-based technique is most effective for a packet loss environment; the degradation of word accuracy is only 5% when the packet loss rate is 30% and only 3% when mean burst loss length is 24 frames in the case of DSR front-end. The simple data imputation method is also effective in the case of clean speech.
  • MA Fattah, F Ren, K Shingo
    ITCC 2004: INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: CODING AND COMPUTING, VOL 2, PROCEEDINGS 298-302 2004年  査読有り
    Parallel corpus is a very important tool to construct a good machine translation system or make any natural language processing research for cross language information retrieval. Internet archive is a good source of parallel documents in different languages. In order to construct a good parallel corpus from the Internet archive, Bilingual dictionary that contains word pairs which may not exist in commercial dictionaries is a must. Extracting a bilingual dictionary from the internet parallel documents is important to add words that are absent from the traditional dictionaries. This paper describes two algorithms to automatically extract an English/Arabic bilingual dictionary from parallel texts that exist in the Internet archive. The system should preferably be useful for many different language pairs. Like most of the systems done, the accuracy of our system is directly proportional to the amount of sentence pairs used By controlling the system parameters, we could achieve 100% precision for the output bilingual dictionary, but the size of the dictionary will be smaller.

MISC

 590

講演・口頭発表等

 30

Works(作品等)

 5

共同研究・競争的資金等の研究課題

 17