研究者業績

黒岩 眞吾

クロイワ シンゴ  (Shingo Kuroiwa)

基本情報

所属
千葉大学 大学院工学研究院 教授
学位
博士(電気通信大学大学院電気通信学研究科電子工学専攻)

研究者番号
20333510
J-GLOBAL ID
200901017262764603
researchmap会員ID
1000356498

外部リンク

経歴

 1

論文

 132
  • S Nakamura, K Takeda, K Yamamoto, T Yamada, S Kuroiwa, N Kitakoka, T Nishiura, A Sasou, M Mizumachi, C Miyajima, M Fujimoto, T Endo
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E88D(3) 535-544 2005年3月  査読有り
    This paper introduces an evaluation framework for Japanese noisy speech recognition named AURORA-2J. Speech recognition systems must still be improved to be robust to noisy environments, but this improvement requires development of the standard evaluation corpus and assessment technologies. Recently, the Aurora 2, 3 and 4 corpora and their evaluation scenarios have had significant impact on noisy speech recognition research. The AURORA-2J is a Japanese connected digits corpus and its evaluation scripts are designed in the same way as Aurora 2 with the help of European Telecommunications Standards Institute (ETSI) AURORA group. This paper describes the data collection, baseline scripts, and its baseline performance. We also propose a new performance analysis method that considers differences in recognition performance among speakers. This method is based on the word accuracy per speaker, revealing the degree of the individual difference of the recognition performance. We also propose categorization of modifications, applied to the original HTK baseline system, which helps in comparing the systems and in recognizing technologies that improve the performance best within the same category.
  • Shingo Kuroiwa, Yoshiyuki Umeda, Satoru Tsuge, Fuji Ren
    INTERSPEECH 2005 - Eurospeech, 9th European Conference on Speech Communication and Technology, Lisbon, Portugal, September 4-8, 2005 3085-3088 2005年  査読有り
  • Masakiyo Fujimoto, Satoshi Nakamura, Kazuya Takeda, Shingo Kuroiwa, Takeshi Yamada, Norihide Kitaoka, Kazumasa Yamamoto, Mitsunori Mizumachi, Takanobu Nishiura, Akira Sasou, Chiyomi Miyajima, Toshiki Endo
    Proceedings - International Workshop on Biomedical Data Engineering, BMDE2005 2005 1208 2005年  査読有り
    This paper introduces a common database, an evaluation framework, and its baseline recognition results for in-car speech recognition, CENSREC-3, as an outcome of IPSJ-SIG SLP Noisy Speech Recognition Evaluation Working Group. CENSREC-3 which is a sequel of AURORA-2J is designed as the evaluation framework of isolated word recognition in real driving car environments. Speech data was collected using 2 microphones, a close-talking microphone and a hands-free microphone, under carefully controlled 16 different driving conditions, i.e., combinations of 3 car speeds and 5 car conditions. CENSREC-3 provides 6 evaluation environments which are designed using speech data collected in these car conditions. © 2005 IEEE.
  • PL Jiang, H Xiang, F Ren, S Kuroiwa
    EMBEDDED AND UBIQUITOUS COMPUTING - EUC 2005 3824 1026-1035 2005年  査読有り
    The study of human-computer interaction is now the most popular research domain overall computer science and psychology science. The most of essential issues recently focus on not only the information about the physical computing but also the affective computing. The emotion states of human being can dramatically affect their actions. It is important for a computer to understand what the people feel at the time. In this paper, we propose a novel method to predict the future emotion state of person depending on the current emotion state and affective factors by an advanced mental state transition network[l]. The psychological experiment with about 100 participants has been done to obtain the structure and the coefficients of the model. The test experiment also has been done to certificate the prediction validity of this model.
  • T Endo, S Kuroiwa, S Nakamura
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E87D(5) 1119-1126 2004年5月  査読有り
    This paper addresses problems involved in performing speech recognition over mobile and IP networks. The main problem is speech data loss caused by packet loss in the network. We present two missing-feature-based approaches that recover lost regions of speech data. These approaches are based on the reconstruction of missing frames or on marginal distributions. For comparison, we also use a packing method, which skips lost data. We evaluate these approaches with packet loss models. i.e., random loss and Gilbert loss models. The results show that the marginal-distributed-based technique is most effective for a packet loss environment; the degradation of word accuracy is only 5% when the packet loss rate is 30% and only 3% when mean burst loss length is 24 frames in the case of DSR front-end. The simple data imputation method is also effective in the case of clean speech.
  • MA Fattah, F Ren, K Shingo
    ITCC 2004: INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: CODING AND COMPUTING, VOL 2, PROCEEDINGS 298-302 2004年  査読有り
    Parallel corpus is a very important tool to construct a good machine translation system or make any natural language processing research for cross language information retrieval. Internet archive is a good source of parallel documents in different languages. In order to construct a good parallel corpus from the Internet archive, Bilingual dictionary that contains word pairs which may not exist in commercial dictionaries is a must. Extracting a bilingual dictionary from the internet parallel documents is important to add words that are absent from the traditional dictionaries. This paper describes two algorithms to automatically extract an English/Arabic bilingual dictionary from parallel texts that exist in the Internet archive. The system should preferably be useful for many different language pairs. Like most of the systems done, the accuracy of our system is directly proportional to the amount of sentence pairs used By controlling the system parameters, we could achieve 100% precision for the output bilingual dictionary, but the size of the dictionary will be smaller.
  • Q Liu, Lu, X, F Ren, S Kuroiwa
    ITCC 2004: INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: CODING AND COMPUTING, VOL 1, PROCEEDINGS 241-245 2004年  査読有り
    Time-series forecasting is an important research area in several domains. Recently, neural networks have been very successfully applied in time series to improve multivariate prediction ability. Several neural network models have already been developed for the market prediction. Some are applied to predicting the change of future interest rate and exchange rate; some are applied to recognizing certain price patterns that are characteristic of future price changes. This paper presents a neural network model for technical analysis of stock market, and its application to a buying and selling timing prediction system for stock index of Japan. This paper also describes a natural language generation system to express prediction information of TOPIX in natural language for non-expert users. This system has evolved to be one of the most comprehensive grammars of English for prediction expressions.
  • Koji Tanaka, Fuji Ren, Shingo Kuroiwa, Satoru Tsuge
    INTERSPEECH 2004 - ICSLP, 8th International Conference on Spoken Language Processing, Jeju Island, Korea, October 4-8, 2004 2004年  査読有り
  • S Kuroiwa, M Naito, M Nakamura, S Sakayori, T Mukasa
    ELECTRONICS AND COMMUNICATIONS IN JAPAN PART II-ELECTRONICS 87(4) 44-52 2004年  査読有り
    A system that automatically rejects prank calls coming through home country direct from abroad, which is one of the international telephone services, is presented in this paper. Home country direct is a service whereby a user can use international telephone services in his/her native language by directly accessing home country's international station operators. Since this service does not require fees for calling operators, prank calls made by children from abroad pose a problem. Thus, an "automatic prank call rejection system" that determines a legitimate user by instructing him in Japanese to say a specific word, and determining him to be a legitimate user if he repeats this word correctly or determining the call to be a prank call otherwise has been developed. When this system was applied to commercial services, it rejected 94.7% of prank calls. Legitimate users erroneously rejected constituted 0.8%. It has been confirmed that erroneously rejected legitimate users ended up being connected by repeating the word correctly ultimately by hanging up the phone and redialling a number of times. This system has been found to reject about 10,000 prank calls a day when applied to the operations of the KDD International Telephone Center since March 1996. (C) 2004 Wiley Periodicals, Inc.
  • F Ren, K Matsumoto, S Mitsuyoshi, S Kuroiwa, G Lin
    2003 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS, VOLS 1-5, CONFERENCE PROCEEDINGS 1666-1672 2003年  査読有り
    In the near future, if will be necessary for the senior citizens to nurse the other senior citizens because of declining population of children and increasing a new type of family. We have been developing welfare robots which can support lives of the senior citizens and have sensibility to lighten the burden imposed on nursing. The measurement of emotions from the usual conversation is considered to be one of its basic researches. In this paper, we are going to propose the algorism of the emotion measurement and the prototype system based on this algorism, and we are also going to discuss its validity.
  • Takeshi Yamada, Jiro Okada, Kazuya Takeda, Norihide Kitaoka, Masakiyo Fujimoto, Shingo Kuroiwa, Kazumasa Yamamoto, Takanobu Nishiura, Mitsunori Mizumachi, Satoshi Nakamura
    8th European Conference on Speech Communication and Technology, EUROSPEECH 2003 - INTERSPEECH 2003, Geneva, Switzerland, September 1-4, 2003 1769-1772 2003年  査読有り
  • Satoru Tsuge, Shingo Kuroiwa, Kenji Kita
    8th European Conference on Speech Communication and Technology, EUROSPEECH 2003 - INTERSPEECH 2003, Geneva, Switzerland, September 1-4, 2003 2003年  査読有り
  • Toshiki Endo, Shingo Kuroiwa, Satoshi Nakamur
    8th European Conference on Speech Communication and Technology, EUROSPEECH 2003 - INTERSPEECH 2003, Geneva, Switzerland, September 1-4, 2003 4 3081-3084 2003年  査読有り
  • S Kuroiwa, S Tsuge
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS 392-395 2003年  査読有り
    In this study we present blind equalization techniques for ETSI standard Distributed Speech Recognition (DSR) front-end which compensate for acoustic mismatch caused by input devices. The DSR front-end employs vector quantization (VQ) for feature parameter compression so that the mismatch does not only cause a shift of parameters but also increases VQ distortion. Although CMS is one of the most effective methods to compensate for the shift, it can not decrease VQ distortion in DSR. To compensate for the shift and decrease VQ distortion simultaneously, the proposed methods estimate the shift in the input data necessary to match the VQ codebook distribution. The methods do not need the acoustic likelihood which is calculated in a decoder on the server side. Therefore, they are applicable to the DSR front-end. Japanese Newspaper Article Sentences database (JNAS) was used for the equalization experiments. While the word error rate (WER) for ETSI standard DSR front-end was 18.6 % under acoustic mismatched condition, our propsed method yielded a rate of 12.3 %.
  • Fuji Ren, Kazuyuki Matsumoto, Shunji Mitsuyoshi, Shingo Kuroiwa, Gai Lin
    Proceedings of the IEEE International Conference on Systems, Man and Cybernetics 2 1666-1672 2003年  
    In ike near future, it will be necessary for the senior citizens to nurse the other senior citizens because of declining population of children and increasing a new type of family. We have been developing welfare robots which can support lives of the senior citizens and have sensibility to lighten the burden imposed on nursing. The measurement of emotions from the usual conversation is considered to be one of its basic researches. In this paper, we are going to propose the algorism of the emotion measurement and the prototype system based on this algorism, and we are also going to discuss its validity.
  • MA Fattah, FJ Ren, K Shingo, A Atlam
    Proceedings of the 46th IEEE International Midwest Symposium on Circuits & Systems, Vols 1-3 978-981 2003年  査読有り
    In order to construct a good machine translation system or make any natural language processing research for cross language information retrieval you must have a good parallel corpus. Internet archive contains a lot of parallel documents. To construct a good parallel corpus from the Internet archive, you must have a good bilingual dictionary. This paper describes an algorithm to automatically extract an English/ Arabic bilingual dictionary from parallel texts that exist in the Internet archive. The system should preferably be useful for many different language pairs. Unlike most of the systems done, our system can extract translation pairs from a very small parallel corpus. This new system can extract translations from only two sentences in one language and two sentences in the other language if the requirements of the system accomplished. Moreover, this system is able to extract word pairs that are translation of each other and the explanation of the Arabic or English word in the other language as well. The accuracy of the system is 59.1% in the case of one English word translated to one Arabic word, 23.9% in the case of one English word translated to more than one Arabic word (Arabic phrase), and 14.6% in the case of one Arabic word translated to more than one English word (English phrase).
  • Tsuneo Kato, Shingo Kuroiwa, Tohru Shimizu, Norio Higuchi
    Systems and Computers in Japan 33(4) 40-49 2002年4月  査読有り
    Tree-based clustering is an effective method for sharing the state of an HMM in which clustering is applied to a set of context-dependent models with the phoneme context as the splitting condition. In past papers, the method has been restricted to the single Gaussian HMM. The single Gaussian HMM, however, is insufficient for representing the acoustic features, and an adequate topology (sharing of HMM state) will not necessarily be realized. Furthermore, in order to arrive at a state-sharing model with the desired number of mixtures, the process of doubling the number of mixtures and the embedded training must be iterated after the tree-based clustering, which increases the time for training. Consequently, this paper proposes a method in which the tree-based clustering algorithm for the single Gaussian HMM is extended to the clustering of the mixed Gaussian HMM. The proposed method reduces the training time to approximately one-third that of the conventional method of handling the single Gaussian HMM. A recognition experiment using a phone typewriter and a recognition experiment for continuous word demonstrate that the recognition rate is improved by one to two points. © 2002 Wiley Periodicals, Inc. Syst. Comp. Jpn.
  • FJ Ren, HC Shi, S Kuroiwa
    2001 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-5 1699-1704 2002年  査読有り
    In this paper, we present a new machine translation (MT) approach using MT engines and sentence partitioning. A multiple engine MT system consists of several MT engines running in parallel, coordinated by a controller. Each engine is implemented using an existing MT technique and has its own characteristics. When translating a sentence, each engine translates it independently. If more than one engine translates the sentence successfully, the controller chooses the best translation according to a combining algorithm implemented using translation statistics. If no engine succeeds in translating the sentence, the controller partitions the sentence, coordinates the engines to translates its constituent simple sentences, and combines the partial translation results into a translation result for the whole input sentence. A complex sentence is partitioned based on conjunctives and punctuation marks such as comma and semicolon. We have developed a multiple engine MT system based on the above approach. The system consists of four independent MT engines. The experiments show that the proposed approach is effective for implementing practical MT systems.
  • S Tsuge, M Shishibori, S Kuroiwa, K Kita
    2001 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-5 960-965 2002年  査読有り
    The, Vector Space Model (VSM) is a conventional information retrieval model, which represents a document, collection by a term-by-document matrix. Since term-by-document, matrices are usually high-dimensional and sparse, they are susceptible to noise and are also difficult, to capture the. underlying semantic, structure. Additionally, the storage, and processing of such matrices places great, demands on computing resources. Dimensionality reduction is a way to overcome these problems. Principal Component, Analysis (PCA) and Singular Value Decomposition (SVD) are. popular techniques for dimensionality reduction based on matrix decomposition, however they contain both positive. and negative values in the decomposed matrices. In the work described here, we use Non-negative Matrix Factorization (NMF) for dimensionality reduction of the vector space model. Since matrices decomposed by NMF only contain non-negative values, the original data are represented by only additive, not subtractive, combinations of the basis vectors. This characteristic of parts-based representation is appealing because it reflects the intuitive notion of combining parts to form a whole. Also NMF computation is based on the simple. iterative algorithm, it, is therefore advantageous for applications involving large, matrices. Using MEDLINE collection, we experimentally showed that NMF offers great improvement, over the. vector space model.
  • Satoru Tsuge, Shingo Kuroiwa, Masami Shishibori, Fuji Ren, Kenji Kita
    7th International Conference on Spoken Language Processing, ICSLP2002 - INTERSPEECH 2002, Denver, Colorado, USA, September 16-20, 2002 2002年  査読有り
  • Masaki Naito, Shingo Kuroiwa, Tsuneo Kato, Tohru Shimizu, Norio Higuchi
    EUROSPEECH 2001 Scandinavia, 7th European Conference on Speech Communication and Technology, 2nd INTERSPEECH Event, Aalborg, Denmark, September 3-7, 2001 1099-1102 2001年  査読有り
  • T Kato, S Kuroiwa, T Shimizu, N Higuchi
    2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS 493-496 2001年  査読有り
    We propose an efficient mixture Gaussian synthesis method for decision tree based state tying that produces better context-dependent models in a short period of training time. This method makes it possible to handle mixture Gaussian HMMs in decision tree based state tying algorithm, and provides higher recognition performance compared to the conventional HMM training procedure using decision tree based state tying on single Gaussian HMMs. This method also reduces the steps of HMM training procedure because the mixture incrementing process is not necessary. We applied this method to training of telephone speech triphones, and evaluated its effect on Japanese phonetically balanced sentence tasks. Our method achieved a 1 to 2 point improvement in phoneme accuracy and a 67% reduction in training time.
  • Toshiaki Uchibe, Shingo Kuroiwa, Norio Higuchi
    Sixth International Conference on Spoken Language Processing, ICSLP 2000 / INTERSPEECH 2000, Beijing, China, October 16-20, 2000 326-329 2000年  査読有り
  • S Kuroiwa, M Naito, S Yamamoto, N Higuchi
    SPEECH COMMUNICATION 27(2) 135-148 1999年3月  査読有り
    This paper describes speech endpoint detection methods for continuous speech recognition systems used over telephone networks. Speech input to these systems may be contaminated not only by various ambient noises but also by various irrelevant sounds generated by users such as coughs, tongue clicking, lip noises and certain out-of-task utterances. Under these adverse conditions, robust speech endpoint detection remains an unsolved problem. We found in fact, that speech endpoint detection errors occurred in over 10% of the inputs in field trials of a voice activated telephone extension system. These errors were caused by problems of (1) low SNR, (2) long pauses between phrases and (3) irrelevant sounds prior to task sentences. To solve the first two problems, we propose a real-time speech ending point detection algorithm based on the implicit approach, which finds a sentence end by comparing the likelihood of a complete sentence hypothesis and other hypotheses. For the third problem, we propose a speech beginning point detection algorithm which rejects irrelevant sounds by using likelihood ratio and duration conditions. The effectiveness of these methods was evaluated under various conditions. As a result, we found that the ending point detection algorithm was not affected by long pauses and that the beginning point detection algorithm successfully rejected irrelevant sounds by using phone HMMs that fit the task. Furthermore, a garbage model of irrelevant sounds was also evaluated and we found that the garbage modeling technique and the proposed method compensated each other in their respective weak points and that the best recognition accuracy was achieved by integrating these methods. (C) 1999 Elsevier Science B.V. All rights reserved.
  • Seiichi Yamamoto, Masaki Naito, Shingo Kuroiwa
    Fifth European Conference on Speech Communication and Technology, EUROSPEECH 1997, Rhodes, Greece, September 22-25, 1997 1997年  査読有り
  • S KUROIWA, K TAKEDA, M NAITO, N INOUE, S YAMAMOTO
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E78D(6) 636-641 1995年6月  査読有り
    We carried out a one year field trial of a voice-activated automatic telephone exchange service at KDD Laboratories which has about 200 branch phones. This system has DSP-based continuous speech recognition hardware which can process incoming calls in real lime using a vocabulary of 300 words. The recognition accuracy was found to be 92.5% for speech read from a written text under laboratory conditions independent of the speaker. In this paper, we describe the performance of the system obtained as a result of the field trial. Apart from recognition accuracy, there was about 20% error due to out-of-vocabulary input and incorrect detection of speech endpoints which had not been allowed for in the laboratory experiments. Also, we found that the recognition accuracy for actual speech was about 18% lower than for speech read from text even if there were no out-of-vocabulary words. In this paper, we examine error variations for individual data in order to try and pinpoint the cause of incorrect recognition. It was found from experiments on the collected data that the pause model used, filled pause grammar and differences of channel frequency response seriously affected recognition accuracy. With the help of simple techniques to overcome these problems, we finally obtained a recognition accuracy of 88.7% for real data.
  • Kazuya Takeda, Shingo Kuroiwa, Masaki Naito, Seiichi Yamamoto
    Fourth European Conference on Speech Communication and Technology, EUROSPEECH 1995, Madrid, Spain, September 18-21, 1995 1995年  査読有り
  • Kazuya Takeda, Tetsunori Murakami, Shingo Kuroiwa, Seiichi Yamamoto
    The 3rd International Conference on Spoken Language Processing, ICSLP 1994, Yokohama, Japan, September 18-22, 1994 1994年  査読有り
  • Kazuya Takeda, Naomi Inoue, Shingo Kuroiwa, Tomohiro Konuma, Seiichi Yamamoto
    Third European Conference on Speech Communication and Technology, EUROSPEECH 1993, Berlin, Germany, September 22-25, 1993 1993年  査読有り
  • Shingo Kuroiwa, Kazuya Takeda, Naomi Inoue, Izuru Nogaito, Seiichi Yamamoto, Makoto Shozakai, Kunihiko Owa, Masahiko Takahashi, Ryuuji Matsumoto
    Third European Conference on Speech Communication and Technology, EUROSPEECH 1993, Berlin, Germany, September 22-25, 1993 1993年  査読有り
  • Shingo Kuroiwa, Kazuya Takeda, Fumihiro Yato, Seiichi Yamamoto, Kunihiko Owa, Makoto Shozakai, Ryuuji Matsumoto
    The Second International Conference on Spoken Language Processing, ICSLP 1992, Banff, Alberta, Canada, October 13-16, 1992 1992年  査読有り
  • Izuru Nogaito, Masahiko Takahashi, Shingo Kuroiwa, Fumihiro Yato
    Second European Conference on Speech Communication and Technology, EUROSPEECH 1991, Genova, Italy, September 24-26, 1991 1991年  査読有り

MISC

 591

講演・口頭発表等

 30

Works(作品等)

 5

共同研究・競争的資金等の研究課題

 17