研究者業績

黒岩 眞吾

クロイワ シンゴ  (Shingo Kuroiwa)

基本情報

所属
千葉大学 大学院工学研究院 教授
学位
博士(電気通信大学大学院電気通信学研究科電子工学専攻)

研究者番号
20333510
J-GLOBAL ID
200901017262764603
researchmap会員ID
1000356498

外部リンク

経歴

 1

論文

 125
  • Manaka Takamizawa, Satoru Tsuge, Yasuo Horiuchi, Shingo Kuroiwa
    KES-HCIS 149-158 2022年  最終著者責任著者
  • Takashi Shimazui, Taka-aki Nakada, Shingo Kuroiwa, Yuki Toyama, Shigeto Oda
    The American Journal of Emergency Medicine 49 414-416 2021年2月  
  • Toshiyuki Ugawa, Satoru Tsuge, Yasuo Horiuchi, Shingo Kuroiwa
    Human Centred Intelligent Systems 405-413 2021年  最終著者
  • Tomoki Hosoyama, Masahiro Koto, Masafumi Nishimura, Masafumi Nishida, Yasuo Horiuchi, Shingo Kuroiwa
    Innovation in Medicine and Healthcare 171-177 2020年6月  最終著者
  • Masahiro Koto, Tomoki Hosoyama, Masafumi Nishimura, Masafumi Nishida, Yasuo Horiuchi, Shingo Kuroiwa
    Proceedings of 2020 RISP International Workshop on Nonlinear Circuits, Communications and Signal Processing 311-314 2020年3月  査読有り最終著者
  • Innovation in Medicine and Healthcare Systems, and Multimedia. Smart Innovation, Systems and Technologies 565-572 2019年6月  査読有り最終著者
  • 長谷川愛, 堀内靖雄, 黒岩眞吾
    人工知能学会研究会資料 SIG-SLUD-B803 13-18 2019年3月  
  • 黒岩眞吾, 堀内靖雄, 古川大輔, 村西幸代
    電子情報通信学会論文誌A J102-A(2) 1-5 2019年2月  査読有り
  • International Journal of Biometrics 11(1) 35-49 2019年1月  査読有り
  • 丸山翔太郎, 黒岩眞吾, 堀内靖雄
    電子情報通信学会論文誌A J102-A(2) 120-123 2019年1月  査読有り
  • 廣實 真弓, 安 啓一, 黒岩 眞吾, 渡辺 雅子
    言語聴覚研究 15(3) 241-241 2018年9月  
  • 黒岩 眞吾, 村西 幸代, 古川 大輔
    総合リハビリテーション 46(6) 525-531 2018年6月  招待有り
  • Satoru Tsuge, Shingo Kuroiwa, Tomoko Ohsuga, Yuichi Ishimoto
    Proc. Oriental COCOSDA 2018 2018年5月  査読有り
  • 古川大輔, 村西幸代, 石渡智一, 長尾圭佑, 根本雅也, 香川哲, 高山亜希子, 山下大貴, 西田昌史, 西村雅史, 黒岩眞吾
    全国自治体病院協議会雑誌 54(4) 617-621 2018年4月  招待有り
  • 古川 大輔, 村西 幸代, 石畑 恭平, 森本 暁彦, 黒岩 眞吾
    地域医療 (第57回特集号) 471-474 2018年3月  招待有り
  • 黒岩眞吾, 村西幸代, 古川大輔
    コミュニケーション障害学 34(1) 22-28 2017年4月  招待有り
  • Fuming Fang, Takahiro Shinozaki, Yasuo Horiuchi, Shingo Kuroiwa, Sadaoki Furui, Toshimitsu Musha
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2016 6898031:1-6898031:9 2016年  査読有り
    Eye motion-based human-machine interfaces are used to provide a means of communication for those who can move nothing but their eyes because of injury or disease. To detect eye motions, electrooculography (EOG) is used. For efficient communication, the input speed is critical. However, it is difficult for conventional EOG recognition methods to accurately recognize fast, sequentially input eye motions because adjacent eye motions influence each other. In this paper, we propose a context-dependent hidden Markov model- (HMM-) based EOG modeling approach that uses separate models for identical eye motions with different contexts. Because the influence of adjacent eye motions is explicitly modeled, higher recognition accuracy is achieved. Additionally, we propose a method of user adaptation based on a user-independent EOG model to investigate the trade-off between recognition accuracy and the amount of user-dependent data required for HMM training. Experimental results show that when the proposed context-dependent HMMs are used, the character error rate (CER) is significantly reduced compared with the conventional baseline under user-dependent conditions, from 36.0 to 1.3%. Although the CER increases again to 17.3% when the context-dependent but user-independent HMMs are used, it can be reduced to 7.3% by applying the proposed user adaptation method.
  • Haoze Lu, Wenbin Zhang, Yasuo Horiuchi, Shingo Kuroiwa
    International Journal of Biometrics 7(2) 83-96 2015年  査読有り
    GMM-UBM super-vectors will potentially lead to worse modelling for speaker verification due to the inter-session variability, especially when a small amount of training utterances were available. In this study, we propose a phoneme dependent method to suppress the inter-session variability. A speaker's model can be represented by several various phoneme Gaussian mixture models. Each of them covers an individual phoneme whose inter-session variability can be constrained in an inter-session independent subspace constructed by principal component analysis (PCA), and it uses corpus uttered by a single speaker that has been recorded over a long period. SVM-based experiments performed using a large corpus, constructed by the National Research Institute of Police Science (NRIPS) to evaluate Japanese speaker recognition, and demonstrate the improvements gained from the proposed method.
  • 有馬志保, 堀内靖雄, 黒岩眞吾, 古川大輔
    電子情報通信学会論文誌A J98-A(1) 139-142 2015年1月  査読有り
  • Shizuka Wada, Yasuo Horiuchi, Shingo Kuroiwa
    Music Technology meets Philosophy - From Digital Echos to Virtual Ethos: Joint Proceedings of the 40th International Computer Music Conference, ICMC 2014, and the 11th Sound and Music Computing Conference, SMC 2014, Athens, Greece, September 14-20, 2014 2014年  査読有り
  • Takaaki Ishii, Hiroki Komiyama, Takahiro Shinozaki, Yasuo Horiuchi, Shingo Kuroiwa
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5 3479-3483 2013年  査読有り
    Denoising autoencoder is applied to reverberant speech recognition as a noise robust front-end to reconstruct clean speech spectrum from noisy input. In order to capture context effects of speech sounds, a window of multiple short-windowed spectral frames are concatenated to form a single input vector. Additionally, a combination of short and long-term spectra is investigated to properly handle long impulse response of reverberation while keeping necessary time resolution for speech recognition. Experiments are performed using the CENSREC-4 dataset that is designed as an evaluation framework for distant-talking speech recognition. Experimental results show that the proposed denoising autoencoder based front-end using the short-windowed spectra gives better results than conventional methods. By combining the long-term spectra, further improvement is obtained. The recognition accuracy by the proposed method using the short and long-term spectra is 97.0% for the open condition test set of the dataset, whereas it is 87.8% when a multi condition training based baseline is used. As a supplemental experiment, large vocabulary speech recognition is also performed and the effectiveness of the proposed method has been confirmed.
  • Xiuqin Wei, Shingo Kuroiwa, Tomoharu Nagashima, Marian K. Kazimierczuk, Hiroo Sekiya
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS 59(9) 2137-2146 2012年9月  査読有り
    This paper introduces a push-pull class-E-M power amplifier for achieving low harmonic contents and high output power. By applying the push-pull configuration of the class-E-M power amplifier, the proposed amplifier achieves an extremely lower total harmonic distortion (THD) and about four times higher output power than the conventional single class-E-M power amplifier. Design curves of the proposed amplifiers for satisfying the class-E-M ZVS/ZDVS/ZCS/ZDCS conditions are given. A design example is shown along with the PSpice-simulation and experimental waveforms for 1-MHz amplifier, considering the MOSFET drain-to-source nonlinear parasitic capacitances, MOSFET switch-on resistances, and equivalent series resistance of the inductors. The waveforms from the PSpice simulation and circuit experiment satisfied all the switching conditions, which has shown the accuracy of the design curves given in this paper and validated the effectiveness of the push-pull class-E-M power amplifier.
  • Fuming Fang, Takahiro Shinozaki, Yasuo Horiuchi, Shingo Kuroiwa, Sadaoki Furui, Toshimitsu Musha
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3 734-737 2012年  査読有り
    To provide an efficient means of communication for those who cannot move muscles of the whole body except eyes due to amyotrophic lateral sclerosis (ALS), we are developing a speech synthesis interface that is based on electrooculogram (BOG) input. BOG is an electrical signal that is observed through electrodes attached on the skin around eyes and reflects eye position. A key component of the system is a continuous recognizer for the BOG signal. In this paper, we propose and investigate a hidden Markov model (HMM) based BOG recogmizer applying continuous speech recognition techniques. In the experiments, we evaluate the recognition system both in user dependent and independent conditions. It is shown that 96.1% of recognition accuracy is obtained for five classes of eye actions by a user dependent system using six channels. While it is difficult to obtain good performance by a user independent system, it is shown that maximum likelihood linear regression (MLLR) adaptation helps for BOG recognition.
  • Takahiro Shinozaki, Yasuo Horiuchi, Shingo Kuroiwa
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) 5029-5032 2012年  査読有り
    Direct likelihood maximization selection (DLMS) selects a subset of language model training data so that likelihood of in-domain development data is maximized. By using recognition hypothesis instead of the in-domain development data, it can be used for unsupervised adaptation. We apply DLMS to iterative unsupervised adaptation for presentation speech recognition. A problem of the iterative unsupervised adaptation is that adapted models are estimated including recognition errors and it limits the adaptation performance. To solve the problem, we introduce the framework of unsupervised cross-validation (CV) adaptation that has originally been proposed for acoustic model adaptation. Large vocabulary speech recognition experiments show that the CV approach is effective for DLMS based adaptation reducing 19.3% of error rate by an initial model to 18.0%.
  • Takahiro Shinozaki, Sadaoki Furui, Yasuo Horiuchi, Shingo Kuroiwa
    2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC) 1-4 2012年  査読有り
    For large vocabulary continuous speech recognition, speech decoders treat time sequence with context information using large probabilistic models. The software of such speech decoders tend to be large and complex since it has to handle both relationships of its component functions and timing of computation at the same time. In the traditional signal processing area such as measurement and system control, block diagram based implementations are common where systems are designed by connecting blocks of components. The connections describe flow of signals and this framework greatly helps to understand and design complex systems. In this research, we show that speech decoders can be effectively decomposed to diagrams or pipelines. Once they are decomposed to pipelines, they can be easily implemented in a highly abstracted manner using a pure functional programming language with delayed evaluation. Based on this perspective, we have re-designed our pure-functional decoder Husky proposing a new design paradigm for speech recognition systems. In the evaluation experiments, it is shown that it efficiently works for a large vocabulary continuous speech recognition task.
  • Yutaka Ono, Misuzu Otake, Takahiro Shinozaki, Ryuichi Nisimura, Takeshi Yamada, Kenkichi Ishizuka, Yasuo Horiuchi, Shingo Kuroiwa, Shingo Imai
    2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC) 1-4 2012年  査読有り
    We are developing S-CAT computer test system that will be the first automated adaptive speaking test for Japanese. The speaking ability of examinees is scored using speech processing techniques without human raters. By using computers for the scoring, it is possible to largely reduce the scoring cost and provide a convenient means for language learners to evaluate their learning status. While the S-CAT test has several categories of question items, open answer question is technically the most challenging one since examinees freely talk about a given topic or argue something for a given material. For this problem, we proposed to use support vector regression (SVR) with various features. Some of the features rely on speech recognition hypothesis and others do not. SVR is more robust than multiple regression and the best result was obtained when 390 dimensional features that combine everything were used. The correlation coefficients between human rated and SVR estimated scores were 0.878, 0.847, 0.853, and 0.872 for fluency, accuracy, content, and richness measures, respectively.
  • Amira Abdelwahab, Hiroo Sekiya, Ikuo Matsuba, Yasuo Horiuchi, Shingo Kuroiwa
    INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & DECISION MAKING 11(1) 33-53 2012年1月  査読有り
    Collaborative filtering (CF) is one of the most prevalent recommendation techniques, providing personalized recommendations to users based on their previously expressed preferences and those of other similar users. Although CF has been widely applied in various applications, its applicability is restricted due to the data sparsity, the data inadequateness of new users and new items (cold start problem), and the growth of both the number of users and items in the database (scalability problem). In this paper, we propose an efficient iterative clustered prediction technique to transform user-item sparse matrix to a dense one and overcome the scalability problem. In this technique, spectral clustering algorithm is utilized to optimize the neighborhood selection and group the data into users' and items' clusters. Then, both clustered user-based and clustered item-based approaches are aggregated to efficiently predict the unknown ratings. Our experiments on MovieLens and book-crossing data sets indicate substantial and consistent improvements in recommendations accuracy compared to the hybrid user-based and item-based approach without clustering, hybrid approach with k-means and singular value decomposition (SVD)-based CF. Furthermore, we demonstrated the effectiveness of the proposed iterative technique and proved its performance through a varying number of iterations.
  • Xiuqin Wei, Hiroo Sekiya, Shingo Kuroiwa, Tadashi Suetsugu, Marian K. Kazimierczuk
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS 58(10) 2556-2565 2011年10月  査読有り
    This paper presents expressions for the waveforms and design equations to satisfy the ZVS/ZDS conditions in the class-E power amplifier, taking into account the MOSFET gate-to-drain linear parasitic capacitance and the drain-to-source nonlinear parasitic capacitance. Expressions are given for power output capability and power conversion efficiency. Design examples are presented along with the PSpice-simulation and experimental waveforms at 2.3 W output power and 4 MHz operating frequency. It is shown from the expressions that the slope of the voltage across the MOSFET gate-to-drain parasitic capacitance during the switch-off state affects the switch-voltage waveform. Therefore, it is necessary to consider the MOSFET gate-to-drain capacitance for achieving the class-E ZVS/ZDS conditions. As a result, the power output capability and the power conversion efficiency are also affected by the MOSFET gate-to-drain capacitance. The waveforms obtained from PSpice simulations and circuit experiments showed the quantitative agreements with the theoretical predictions, which verify the expressions given in this paper.
  • Wenbin Zhang, Haoze Lu, Yasuo Horiuchi, Satoru Tsuge, Kenji Kita, Shingo Kuroiwa
    Journal of Signal Processing Vol.15(No.4) 275-278 2011年7月  査読有り
    テキスト独立な話者認識において,音声変動やセッション間変動は話者認識の精度に大きな影響を与える.本論文では,PCA変換を用いることにより,音声データのセッション間変動を削減することを提案する.提案手法を用いることにより,MFCCを用いる従来手法に比べ,誤認識率を42.6%削減でき,MFB-PCAに基づく手法に比べ,誤認識率を27.2%削減できた.
  • Manabu Sasayama, Shingo Kuroiwa, Fuji Ren
    ELECTRONICS AND COMMUNICATIONS IN JAPAN 94(4) 44-54 2011年4月  査読有り
    Super-function based machine translation (SFBMT), which is a type of example-based machine translation, has a feature which makes it possible to expand the coverage of examples by changing nouns into variables. However, there have been problems extracting entire date/time expressions containing parts-of-speech other than nouns, because only nouns/numbers were changed into variables. We describe a method of extracting date/time expressions for SFBMT. SFBMT uses noun determination rules to extract nouns and a bilingual dictionary to obtain the correspondence of the extracted nouns between the source and the target languages. In this method, we add a rule to extract date/time expressions and then extract date/time expressions from a Japanese English bilingual corpus. The evaluation results shows that the precision of this method for Japanese sentences is 96.7%, with a recall of 98.2%, and the precision for English sentences is 94.7%, with a recall of 92.7%. (C) 2011 Wiley Periodicals, Inc. Electron Comm Jpn, 94(4): 44-54, 2011; Published online in Wiley Online Library (wileyonlinelibrary.com). DOI 10.1002/ecj.10262
  • Shiori Takenaka, Takahiro Shinozaki, Yasuo Horiuchi, Shingo Kuroiwa
    NLP-KE 2011 - Proceedings of the 7th International Conference on Natural Language Processing and Knowledge Engineering 265-268 2011年  査読有り
    A novel speaker verification method is proposed that utilizes pseudo speaker models in the speaker ranking selection (SRS) method. SRS is a recently proposed method that has been shown to give higher performance than the traditional T-norm method. However, the superior performance of SRS is based on utilizing a large number of background speaker models. When enough number of speakers is not available for the background models, the performance of SRS significantly degrades. To achieve higher performance with SRS even when only a small number of speakers are available, we propose to augment the set of background models by adding pseudo speaker models (PSMs). Text-independent speaker verification experiments are performed using a large scale corpus designed for speaker recognition constructed by National Research Institute of Police Science (NRIPS) in Japan. It is shown that the proposed SRS based system with PSMs gives 0.29% equal error rate, which is lower than 0.46% by the original SRS. The minimum DCF scores by the proposed and the original methods are 0.14 and 0.63, respectively. © 2011 IEEE.
  • Takahiro Fukumori, Takanobu Nishiura, Masato Nakayama, Yuki Denda, Norihide Kitaoka, Takeshi Yamada, Kazumasa Yamamoto, Satoru Tsuge, Masakiyo Fujimoto, Tetsuya Takiguchi, Chiyomi Miyajima, Satoshi Tamura, Tetsuji Ogawa, Shigeki Matsuda, Shingo Kuroiwa, Kazuya Takeda, Satoshi Nakamura
    Acoustical Science and Technology 32(5) 201-210 2011年  査読有り
    We have been distributing a new collection of databases and evaluation tools called CENSREC-4, which is a framework for evaluating distant-talking speech in reverberant environments. The data contained in CENSREC-4 are connected digit utterances as in CENSREC-1. Two subsets are included in the data: "basic data sets" and "extra data sets." The basic data sets are used for evaluating the room impulse response-convolved speech data to simulate the various reverberations. The extra data sets consist of simulated data and corresponding real recorded data. Evaluation tools are presently only provided for the basic data sets and will be delivered to the extra data sets in the future. The task of CENSREC-4 with a basic data set appears simple however, the results of experiments prove that CENSREC-4 provides a challenging reverberation speech-recognition task, in the sense that a traditional technique to improve recognition and a widely used criterion to represent the difficulty of recognition deliver poor performance. Within this context, this common framework can be an important step toward the future evolution of reverberant speech-recognition methodologies. © 2011 The Acoustical Society of Japan.
  • Haoze Lu, Masafumi Nishida, Yasuo Horiuchi, Shingo Kuroiwa
    International Journal of Biometrics 2(4) 379-390 2010年9月  査読有り
    In this paper we proposed a text-independent (TI) speaker identification method that suppresses the phonetic information by a subspace method, under the assumption that a subspace with large variance in the speech feature space is a 'phoneme-dependent subspace' and a complementary subspace of it is a 'phoneme-independent subspace'. Principal Comonet Analysis (PCA) is employed to construct these subspaces. Gaussian Mixture Model (GMM)-based speaker identification experiments using both the phonetic information suppressed feature and the conventional Mel-Frequency Ceptrum Coefficient (MFCC) were carried out. As a result, the proposed method has been proven to be effective for decreasing the identification error rates. Copyright © 2010 Inderscience Enterprises Ltd.
  • Masafumi Nishida, Yasuo Horiuchi, Shingo Kuroiwa, Akira Ichikawa
    TEXT, SPEECH AND DIALOGUE 6231 539-+ 2010年  査読有り
    The purpose of our study is to develop a spoken dialogue system for in-vehicle appliances. Such a multi-domain dialogue system should be capable of reacting to a change of the topic, recognizing fast and accurately separating words as well as whole sentences. We propose a novel recognition method by integrating a sentence, partial words, and phonemes. The degree of confidence is determined by the degree to which recognition results match on these three levels. We conducted speech recognition experiments for in-vehicle appliances. In the case of sentence units, the recognition accuracy was 96.2% by the proposed method and 92.9% by the conventional word bigram. As for word units, recognition accuracy of the proposed method was 86.2% while that of whole word recognition was 75.1%. Therefore, we concluded that our method can be effectively applied in spoken dialogue systems for in-vehicle appliances.
  • Xiuqin Wei, Hiroo Sekiya, Shingo Kuroiwa, Tadashi Suetsugu, Marian K. Kazimierczuk
    2010 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS 3200-3203 2010年  査読有り
    In this paper, we present analytical expressions for the waveforms and design equations for achieving the ZVS/ZDS conditions in the class-E power amplifier, taking into account the gate-to-drain parasitic capacitance of the MOSFET. We also give a design example along with PSpice simulation and experimental results. The voltage waveforms obtained from both the PSpice simulation and the circuit experiment achieved the class-E ZVS/ZDS conditions completely, which verify the analytical expressions. The results in this paper indicate that it is important to consider the effect of the MOSFET gate-to-drain capacitance for achieving the class E ZVS/ZDS conditions. The experimental power conversion efficiency achieved 92.8 % at output power P-o = 4.06 W and operating frequency f = 7 MHz.
  • Satoshi Tamura, Chiyomi Miyajima, Norihide Kitaoka, Takeshi Yamada, Satoru Tsuge, Tetsuya Takiguchi, Kazumasa Yamamoto, Takanobu Nishiura, Masato Nakayama, Yuki Denda, Masakiyo Fujimoto, Shigeki Matsuda, Tetsuji Ogawa, Shingo Kuroiwa, Kazuya Takeda, Satoshi Nakamur
    Auditory-Visual Speech Processing, AVSP 2010, Hakone, Kanagawa, Japan, September 30 - October 3, 2010 6 2010年  査読有り
  • Michihiro Jinnai, Satoru Tsuge, Shingo Kuroiwa, Fuji Ren, Minoru Fukumi
    International Journal of Advanced Intelligence (IJAI) Vol.1(No.1) 59-88 2009年11月  査読有り
    類似性の度合いを数値的に評価するために幾何学的尺度と呼ぶ新しい類似性尺度を提案する.通常,類似性の尺度はユークリッド距離やコサイン類似度が使用されているが,ノイズや歪みの存在する場合には,上手く機能しない.本論文は,それらの欠点を克服する類似性尺度の新しい数学的モデルを提案し,認識精度が改善できた.様々なノイズが含まれる母音認識で実験を行い,全ての場合でかなりの改善効果が見られ,MFCC法よりも優れていることが判った.
  • Jiajun Yan, David B. Bracewell, Fuji Ren, Shingo Kuroiwa
    Electronic Notes in Theoretical Computer Science 225(C) 457-468 2009年1月2日  査読有り
    Semantic Dependency Analysis (SDA) has extensive applications in Natural Language Processing (NLP). In this paper, an integration of multiple classifiers is presented for SDA of Chinese. A Naive Bayesian Classifier, a Decision Tree and a Maximum Entropy classifier are used in a majority wins voting scheme. A portion of the Penn Chinese Treebank was manually annotated with semantic dependency structure. Then each of the three classifiers was trained on the same training data. All three of the classifiers were used to produce candidate relations for test data and the candidate relation that had the majority vote was chosen. The proposed approach achieved an accuracy of 86% in experimentation, which shows that the proposed approach is a promising one for semantic dependency analysis of Chinese. © 2008 Elsevier B.V. All rights reserved.
  • Manabu Sasayama, Fuji Ren, Shingo Kuroiwa
    Electronic Notes in Theoretical Computer Science 225(C) 329-340 2009年1月2日  査読有り
    Extraction of a large Super-Function (SF) is one of the most important factor in realizing SF based machine translation. This paper presents a method to automatically extract SF from a Japanese-English bilingual corpus. The extraction process matches Japanese noun and English noun in each bilingual sentence in a bilingual corpus using a bilingual dictionary. The experimental results show that this method performs very well in automatically extracting SF for machine translation. Then, we discuss a problem of SF based machine translation from the result of the evaluation experiment using extracted SF. © 2008 Elsevier B.V. All rights reserved.
  • David B. Bracewell, Jiajun Yan, Fuji Ren, Shingo Kuroiwa
    Electronic Notes in Theoretical Computer Science 225(C) 51-65 2009年1月2日  査読有り
    This paper presents algorithms for topic analysis of news articles. Topic analysis entails category classification and topic discovery and classification. Dealing with news has special requirements that standard classification approaches typically cannot handle. The algorithms proposed in this paper are able to do online training for both category and topic classification as well as discover new topics as they arise. Both algorithms are based on a keyword extraction algorithm that is applicable to any language that has basic morphological analysis tools. As such, both the category classification and topic discovery and classification algorithms can be easily used by multiple languages. Through experimentation the algorithms are shown to have high precision and recall in tests on English and Japanese. © 2008 Elsevier B.V. All rights reserved.
  • Amira Abdelwahab, Hiroo Sekiya, Ikuo Matsuba, Yasuo Horiuchi, Shingo Kuroiwa, Masafumi Nishida
    IEEE NLP-KE 2009: PROCEEDINGS OF INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING 220-+ 2009年  査読有り
    The amount of accessible information in the Internet increases every day and it becomes greatly difficult to deal with such a huge source of information. Consequently, Recommender Systems (RS) which are considered as powerful tools for Information Retrieval (IR), can access these available information efficiently. Unfortunately, the recommendations accuracy is seriously affected by the problems of data sparsity and scalability. Additionally, the time of recommendations is very essential in the Recommender Systems. Therefore, we propose a proficient dimensionality reduction-based Collaborative Filtering (CF) Recommender System. In this technique, the Singular Value Decomposition-free (SVD-free) Latent Semantic Indexing (LSI) is utilized to obtain a reduced data representation solving the sparsity and scalability limitations. Also, the SVD-free extremely reduce the time and memory usage required for dimensionality reduction employing the partial symmetric Eigenproblem. Moreover, to estimate the optimal number of reduced dimensions which greatly influences the system accuracy, the Particle Swarm Optimization (PSO) algorithm is utilized to automatically obtain it. As a result, the proposed technique enormously increases the recommendations prediction quality and speed. In additions, it decreases the memory requirements. To show the efficiency of the proposed technique, we employed it to the MovieLens dataset and the results was very promising.
  • Yuta Yasugahira, Yasuo Horiuchi, Shingo Kuroiwa
    ACM International Conference Proceeding Series 331-334 2009年  査読有り
    To achieve the greater accessibility for deaf people, sign language recognition systems and sign language animation systems must be developed. In Japanese sign language (JSL), previous studies have suggested that emphasis and emotion cause changes in hand movements. However, the relationship between emphasis and emotion and the signing speed has not been researched enough. In this study, we analyzed the hand movement variation in relation to the signing speed. First, we recorded 20 signed sentences at three speeds (fast, normal, and slow) using a digital video recorder and a 3D position sensor. Second, we segmented sentences into three types of components (sign words, transitions, and pauses). In our previous study, we analyzed hand movement variations of sign words in relation to the signing speed. In this study, we analyzed transitions between adjacent sign words by a method similar to that in the previous study. As a result, sign words and transitions showed a similar tendency, and we found that the variation in signing speed mainly caused changes in the distance hands moved. Furthermore, we compared transitions with sign words and found that transitions were slower than sign words. Copyright 2009 ACM.
  • Haruka Okamoto, Satoru Tsuge, Amira Abdelwahab, Masafumi Nishida, Yasuo Horiuchi, Shingo Kuroiwa
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5 2319-+ 2009年  査読有り
    In this paper, we propose a novel speaker verification method which determines whether a claimer is accepted or rejected by the rank of the claimer in a large number of speaker models instead of score normalization, such as T-norm and Z-norm. The method has advantages over the standard T-norm in speaker verification accuracy. However, it needs much computation time as well as T-norm that needs calculating likelihoods for many cohort models. Hence, we also discuss the speed-up using the method that selects cohort subset for each target speaker in the training stage. This data driven approach can significantly reduce computation resulting in faster speaker verification decision. We conducted text-independent speaker verification experiments using large-scale Japanese speaker recognition evaluation corpus constructed by National Research Institute of Police Science. As a result, the proposed method achieved an equal error rate of 2.2 %, while T-norm obtained 2.7 %.
  • Amira Abdelwahab, Hiroo Sekiya, Ikuo Matsuba, Yasuo Horiuchi, Shingo Kuroiwa
    iiWAS2009 - The 11th International Conference on Information Integration and Web-based Applications and Services 375-379 2009年  査読有り
    Collaborative filtering (CF) is one of the most popular recommender system technologies. It tries to identify users that have relevant interests and preferences by calculating similarities among user profiles. The idea behind this method is that, it may be of benefit to one's search for information to consult the preferences of other users who share the same or relevant interests and whose opinion can be trusted. However, the applicability of CF is limited due to the sparsity and cold-start problems. The sparsity problem occurs when available data are insufficient for identifying similar users (neighbors) and it is a major issue that limits the quality of recommendations and the applicability of CF in general. Additionally, the cold-start problem occurs when dealing with new users and new or updated items in web environments. Therefore, we propose an efficient iterative prediction technique to convert user-item sparse matrix to dense one and overcome the cold-start problem. Our experiments with MovieLens and book-crossing data sets indicate substantial and consistent improvements in recommendations accuracy compared with item-based collaborative filtering, singular value decomposition (SVD)-based collaborative filtering and semi explicit rating collaborative filtering. © 2010 ACM.
  • Kuroiwa Shingo, Tsuge Satoru, Ren Fuji
    International Journal of Biomedical Soft Computing and Human Sciences: the official journal of the Biomedical Fuzzy Systems Association 14(1) 3-10 2009年  
    Recently, Distributed Speech Recognition (DSR) systems are widely deployed in Japanese cellular telephone networks. In these systems, personal authentication with voice is strongly desired. In this paper, we present several speaker recognition techniques developed in the University of Tokushima for Distributed Speaker Identification/Verification (DSI/DSV) systems. Especially, we present recent progress on a non-parametric speaker recognition system that is robust to quantization in the distributed systems comparing with conventional speaker recognition systems based on Gaussian Mixture Model (GMM). Evaluation results using the Japanese de facto standard speaker recognition corpus and CCC Speaker Recognition Evaluation 2006 data developed by the Chinese Corpus Consortium (CCC) show higher performance of the proposed method than GMM and VQ-distortion in the European Telecommunications Standards Institute (ETSI) DSR standard environment.
  • Satoru Tsuge, Daichi Koizumi, Minoru Fukumi, Shingo Kuroiwa
    2009 INTERNATIONAL SYMPOSIUM ON INTELLIGENT SIGNAL PROCESSING AND COMMUNICATION SYSTEMS (ISPACS 2009) 449-452 2009年  査読有り
    Recently, some new sensors, such as bone-conductive microphones, throat microphones, and non-audible murmur (NAM) microphones, besides conventional condenser microphones have been developed for collecting speech data. Accordingly, some researchers began to study speaker and speech recognition using speech data collected by these new sensors. We focus on bone-conduction speech data collected by the bone-conductive microphone. In this paper, we first investigate speaker verification performances of bone-conduction speech. In addition, we propose a method of using bone-conduction speech and air-conduction together for the speaker verification. The proposed method integrates the similarity calculated by air-conduction speech model and similarity calculated by bone-conduction speech model. Using 99 female speakers' speech data, we conducted speaker verification experiments. Experimental results show that the speaker verification performance of bone-conduction is lower than that of air-conduction speech. However, the proposed method can improve the speaker verification performance of bone- and air-conduction speech. Actually, the proposed method can reduce the equal error rate of air-conduction speech by 16.0% and the equal error rate of bone-conduction speech by 71.7%.
  • Norihide Kitaoka, Takeshi Yamada, Satoru Tsuge, Chiyomi Miyajima, Kazumasa Yamamoto, Takanobu Nishiura, Masato Nakayama, Yuki Denda, Masakiyo Fujimoto, Tetsuya Takiguchi, Satoshi Tamura, Shigeki Matsuda, Tetsuji Ogawa, Shingo Kuroiwa, Kazuya Takeda, Satoshi Nakamura
    Acoustical Science and Technology 30(5) 363-371 2009年  査読有り
    Voice activity detection (VAD) plays an important role in speech processing including speech recognition, speech enhancement, and speech coding under noisy environments. We have developed an evaluation framework for VAD under noisy environments, named CENSREC-1-C. We designed this framework for simple isolated utterance detection and hence, this framework consists of noisy continuous digit utterances and evaluation tools for VAD results. We define two evaluation measures, one for frame-level detection performance and the other for utterance-level detection performance. We also provide the evaluation results of a power-based VAD method as a reference. ©2009 The Acoustical Society of Japan.
  • Takanobu Nishiura, Masato Nakayama, Yuki Denda, Norihide Kitaoka, Kazumasa Yamamoto, Takeshi Yamada, Satoru Tsuge, Chiyomi Miyajima, Masakiyo Fujimoto, Tetsuya Takiguchi, Satoshi Tamura, Shingo Kuroiwa, Kazuya Takeda, Satoshi Nakamura
    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008 1828-1834 2008年  査読有り
    Recently, speech recognition performance has been drastically improved by statistical methods and huge speech databases. Now performance improvement under such realistic environments as noisy conditions is being focused on. Since October 2001, we from the working group of the Information Processing Society in Japan have been working on evaluation methodologies and frameworks for Japanese noisy speech recognition. We have released frameworks including databases and evaluation tools called CENSREC-1 (Corpus and Environment for Noisy Speech RECognition 1; formerly AURORA-2J), CENSREC-2 (in-car connected digits recognition), CENSREC-3 (in-car isolated word recognition), and CENSREC-1-C (voice activity detection under noisy conditions). In this paper, we newly introduce a collection of databases and evaluation tools named CENSREC-4, which is an evaluation framework for distant-talking speech under hands-free conditions. Distant-talking speech recognition is crucial for a hands-free speech interface. Therefore, we measured room impulse responses to investigate reverberant speech recognition. The results of evaluation experiments proved that CENSREC-4 is an effective database suitable for evaluating the new dereverberation method because the traditional dereverberation process had difficulty sufficiently improving the recognition performance. The framework was released in March 2008, and many studies are being conducted with it in Japan.
  • Masaru Maebatake, Iori Suzuki, Masafumi Nishida, Yasuo Horiuchi, Shingo Kuroiwa
    PROCEEDINGS OF THE SECOND INTERNATIONAL SYMPOSIUM ON UNIVERSAL COMMUNICATION 478-481 2008年  査読有り
    In sign language, hand positions and movements represent meaning of words. Hence, we have been developing sign language recognition methods using both of hand positions and movements. However, in the previous studies, each feature has same weight to calculate the probability for the recognition. In this study, we propose a sign language recognition method by using a multi-stream HMM technique to show the importance of position and movement information for the sign language recognition. We conducted recognition experiments using 21,960 sign language word data. As a result, 75.6% recognition accuracy was obtained with the appropriate weight (position:movement=0.2:0.8), while 70.6% was obtained with the same weight. From the result, we can conclude that the hand movement is more important for the sign language recognition than the hand position. In addition, we conducted experiments to discuss the optimal number of the states and mixtures and the best accuracy was obtained by the 15 states and two mixtures for each word HMM.
  • Satoru Tsuge, Takashi Osanai, Hisanori Makinae, Toshiaki Kamada, Minoru Fukumi, Shingo Kuroiwa
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5 1929-+ 2008年  査読有り
    Recently, some new sensors, such as bone-conductive microphones, throat microphones, and non-audible murmur (NAM) microphones, besides conventional condenser microphones have been developed for collecting speech data. Accordingly, some researchers began to study speaker and speech recognition using speech data collected by these new sensors. We focus on bone-conduction speech data collected by the bone-conductive microphone. This paper proposes a novel speaker identification method which combines "bone-conduction speech" and "air-conduction speech". The proposed method conducts speaker identification by integrating the similarity calculated by air-conduction speech model and similarity calculated by bone-conduction speech model. For evaluating the proposed method, we conduct the speaker identification experiment using part of a large bone-conduction speech corpus constructed by National Research Institute of Police Science, Japan (NRIPS). Experimental results show that the proposed method can reduce a identification error rate of air-conduction speech and bone-conduction speech. Especially, the proposed method achieves that the average error reduction rate from air-conduction speech to the proposed method is 35.8%.

MISC

 590

講演・口頭発表等

 30

Works(作品等)

 5

共同研究・競争的資金等の研究課題

 17