堀内靖雄

ホリウチヤスオ (Yasuo Horiuchi)

基本情報

所属: 千葉大学大学院情報学研究院准教授

学位: 博士(工学)(1995年3月東京工業大学)

J-GLOBAL ID: 200901021029331583
researchmap会員ID: 1000191929

研究分野

情報通信 / 知能情報学 /

受賞

主要な論文

Determining the base frequency of the <i>F</i><sub>0</sub> contour generation model for the diverse expression of speech

Yoshiko Arimoto, Yasuo Horiuchi, Sumio Ohno

Acoustical Science and Technology 46(1) 2025年1月査読有り
「対話のことば」に共通な機能を形成する要因の考察

市川熹, 長嶋祐二, 堀内靖雄

日本音響学会誌 80(7) 355-366 2024年7月査読有り
Constructing a Highly Accurate Japanese Sign Language Motion Database Including Dialogue

Yuji Nagashima, Keiko Watanabe, Daisuke Hara, Yasuo Horiuchi, Shinji Sako, Akira Ichikawa

Communications in Computer and Information Science 76-81 2020年6月査読有り
Discussion of a Japanese sign language database and its annotation systems with consideration for its use in various areas

Shinji Sako, Yuji Nagashima, Daisuke Hara, Yasuo Horiuchi, Keiko Watanabe, Ritsuko Kikusawa, Naoto Kato, Akira Ichikawa

Proceeding of LingCologne 2019 2019年6月6日査読有り
Construction of a Japanese Sign Language Database with Various Data Types

Keiko Watanabe, Yuji Nagashima, Daisuke Hara, Yasuo Horiuchi, Shinji Sako, Akira Ichikawa

Communications in Computer and Information Science 317-322 2019年査読有り
Constructing a Japanese Sign Language Multi-Dimensional Database

•Yuji Nagashima, Daisuke Hara, Shinji Sako, Keiko Watanabe, Yasuo Horiuchi, Ritsuko Kikusawa, Naoto Kato, Akira Ichikawa

The 7th Meeting of Signed and SpokenLanguage Linguistics (SSLL 2018) 2018年9月28日査読有り
心的負担が軽い「対話のことば」の構造

市川熹, 堀内靖雄, 長嶋祐二

ヒューマンインタフェース学会論文誌 20(2) 191-204 2018年査読有り

We had shown experimental results on prosody of languages characterized by real-time dialogue such as speech, sign language, finger braille and so on. These results were discussed along with various research results both from inside and outside Japan. Based on the results, we examined a structure that enabled real-time dialogue with a light mental burden. Furthermore, we will propose a model which makes real-time dialogue possible by elucidating information structures of various languages characterized by real-time dialogue. The model to be proposed can explain various phenomena in real-time dialogue.

もっとみる

MISC

559

順位統計量を用いた話者照合のためのコホート話者選択方法 (音声)

岡本悠, 柘植覚, 堀内靖雄

電子情報通信学会技術研究報告 109(356) 153-158 2009年12月21日
音声認識の信頼度に着目した文境界検出に関する検討

畑昇吾, 西田昌史, 堀内靖雄, 黒岩眞吾

音声言語情報処理（SLP） 2009(20) 1-6 2009年12月14日

自然言語処理では処理単位として文などの意味的なまとまりがある単位を用いるため，音声認識結果に対して文境界を示す必要がある．本研究では，まず SVM を用いた文境界検出において文境界直前における語の出現しやすさを考慮することによって文境界検出に適した特徴空間の作成方法を提案する．さらに，音声認識時に認識結果と共に出力される単語信頼度を素性として文境界検出に利用することを検討する．文境界検出においては『日本語話し言葉コーパス（CSJ）』を対象として SVM を用いて評価実験を行った．Since the units of processing for Natural Language Processing(NLP) are based on syntactic structure, for example sentence, it is necessary to detect the sentence boundary for the Automatic Speech Recognition(ASR) outputs. In this paper, at first, we propose the feature space that is applied to detecting sentence boundary with Support Vector Machine(SVM) by considering the frequency of the word immediately before sentence boundary. At second, we examine using confidence measure of ASR outputs for sentence boundary detection with SVM. We evaluated our methods on the Corpus of Spontaneous Japanese(CSJ).
順位統計量を用いた話者照合のためのコホート話者選択方法

岡本悠, 柘植覚, 堀内靖雄, 黒岩眞吾

音声言語情報処理（SLP） 2009(27) 1-6 2009年12月14日

本論文では，順位統計量を用いた話者照合手法を紹介する．さらに，順位統計量を用いた話者照合手法における照合コストを下げるためのコホート話者の選択方法について提案する．コホート話者は申告者の音声に対してシステムに登録された不特定多数の話者モデル（GMM）との尤度の順位を基準に作成する．評価実験として，科学警察研究所が構築した大規模話者骨導音声データベースに収録されている男性 283 名の気導音声を用いて実験を行った．従来手法では，全話者 283 名による順位統計量で算出した minDCF が 0.0092 に対して，提案手法では平均 57 名の順位統計量で 0.0098，101 名の順位統計量で 0.0094 という同等の性能を達成した．また，照合スコアとして T-norm を用いた場合の minDCF が 0.0154 だった．In this paper, we introduce a novel speaker verification method which determines whether a claimer is accepted or rejected by the rank of the claimer in a large number of speaker models instead of score normalization, such as T-norm and Z-norm. The method has advantages over the standard T-norm in speaker verification accuracy. However, it needs much computation time as well as T-norm that needs calculating likelihoods for many cohort models. Hence, we also discuss the speed-up the method that selects cohort speakers for each target speaker in the training stage. This data driven approach can significantly reduce computation time resulting in faster speaker verification decision. We conducted text-independent speaker verification experiments using large-scale Japanese speaker recognition evaluation corpus constructed by National Research Institute of Police Science. From the corpus, we used utterances collected from 283 Japanese males. As results, the proposed method whose the number of cohort speaker is 57 achieved an minDCF of 0.0098, while using 282 speakers as cohort speaker obtained 0.0092 and T-norm obtained 0.0154.
順位統計量を用いた話者照合のためのコホート話者選択方法

岡本悠, 柘植覚, 堀内靖雄, 黒岩眞吾

電子情報通信学会技術研究報告 109(355(NLC2009 12-32)) 153-158 2009年12月14日

本論文では，順位統計量を用いた話者照合手法を紹介する．さらに，順位統計量を用いた話者照合手法における照合コストを下げるためのコホート話者の選択方法について提案する．コホート話者は申告者の音声に対してシステムに登録された不特定多数の話者モデル（GMM）との尤度の順位を基準に作成する．評価実験として，科学警察研究所が構築した大規模話者骨導音声データベースに収録されている男性 283 名の気導音声を用いて実験を行った．従来手法では，全話者 283 名による順位統計量で算出した minDCF が 0.0092 に対して，提案手法では平均 57 名の順位統計量で 0.0098，101 名の順位統計量で 0.0094 という同等の性能を達成した．また，照合スコアとして T-norm を用いた場合の minDCF が 0.0154 だった．In this paper, we introduce a novel speaker verification method which determines whether a claimer is accepted or rejected by the rank of the claimer in a large number of speaker models instead of score normalization, such as T-norm and Z-norm. The method has advantages over the standard T-norm in speaker verification accuracy. However, it needs much computation time as well as T-norm that needs calculating likelihoods for many cohort models. Hence, we also discuss the speed-up the method that selects cohort speakers for each target speaker in the training stage. This data driven approach can significantly reduce computation time resulting in faster speaker verification decision. We conducted text-independent speaker verification experiments using large-scale Japanese speaker recognition evaluation corpus constructed by National Research Institute of Police Science. From the corpus, we used utterances collected from 283 Japanese males. As results, the proposed method whose the number of cohort speaker is 57 achieved an minDCF of 0.0098, while using 282 speakers as cohort speaker obtained 0.0092 and T-norm obtained 0.0154.
音声認識の信頼度に着目した文境界検出に関する検討

畑昇吾, 西田昌史, 堀内靖雄, 黒岩眞吾

電子情報通信学会技術研究報告 109(355(NLC2009 12-32)) 111-116 2009年12月14日

自然言語処理では処理単位として文などの意味的なまとまりがある単位を用いるため，音声認識結果に対して文境界を示す必要がある．本研究では，まず SVM を用いた文境界検出において文境界直前における語の出現しやすさを考慮することによって文境界検出に適した特徴空間の作成方法を提案する．さらに，音声認識時に認識結果と共に出力される単語信頼度を素性として文境界検出に利用することを検討する．文境界検出においては『日本語話し言葉コーパス（CSJ）』を対象として SVM を用いて評価実験を行った．Since the units of processing for Natural Language Processing(NLP) are based on syntactic structure, for example sentence, it is necessary to detect the sentence boundary for the Automatic Speech Recognition(ASR) outputs. In this paper, at first, we propose the feature space that is applied to detecting sentence boundary with Support Vector Machine(SVM) by considering the frequency of the word immediately before sentence boundary. At second, we examine using confidence measure of ASR outputs for sentence boundary detection with SVM. We evaluated our methods on the Corpus of Spontaneous Japanese(CSJ).
ヒューマンインタフェースシンポジウム2009報告

椎尾一郎, 竹内勇剛, 鱗原晴彦, 井野秀一, 綿貫啓子, 竹内勇剛, 西山敏樹, 小林茂, 吉池佑太, 藤田欣也, 森麻紀, 杉原太郎, 橋爪絢子, 亀山研一, 岸野文郎, 福本雅朗, 加藤博一, 北村喜文, 渡辺富夫, 間瀬健二, 葛岡英明, 浅野陽子, 伊藤潤, 宮田一乘, 水口充, 中川正樹, 竹田仰, 美馬義亮, 郷健太郎, 辻野嘉宏, 大久保雅史, 井野秀一, 岡本明, 堀内靖雄, 塩瀬隆之, 土井美和子, 竹林洋一, 藤田欣也, 下田宏, 岡田美智男, 小嶋弘行, 清川清, 森川治, 小谷賢太郎, 大須賀美恵子, 和氣早苗, 大倉典子, 仲谷善雄, 志堂寺和則, 福住伸一, 高橋信, 辛島光彦, 角康之, 雨宮智浩

ヒューマンインタフェース学会誌 = Journal of Human Interface Society : human interface 11(4) 265-277 2009年11月25日
視覚障害者のための意味情報を用いた仮名漢字変換における説明語選択手法の検討

小宮菜月, 西田昌史, 堀内靖雄, 黒岩眞吾

電子情報通信学会技術研究報告 109(259(SP2009 49-61)) 69-74 2009年10月22日

我々は視覚障害者のための仮名漢字変換手法として,意味情報を用いた手法を検討してきている.説明語には主に類義語を利用するが,複数ある類義語からどの単語を説明語として選択するかの基準は存在しない.本研究において,変換語とその同義語の意味的近さを主観評価実験により調査した結果,単語親密度が高いほど意味的に近いと評定された.この結果より単語親密度に基づく選択手法が妥当であることが示された.
視覚障害者のための意味情報を用いた仮名漢字変換における説明語選択手法の検討

小宮菜月, 西田昌史, 堀内靖雄, 黒岩眞吾

電子情報通信学会技術研究報告. WIT, 福祉情報工学 109(260) 69-74 2009年10月22日

我々は視覚障害者のための仮名漢字変換手法として,意味情報を用いた手法を検討してきている.説明語には主に類義語を利用するが,複数ある類義語からどの単語を説明語として選択するかの基準は存在しない.本研究において,変換語とその同義語の意味的近さを主観評価実験により調査した結果,単語親密度が高いほど意味的に近いと評定された.この結果より単語親密度に基づく選択手法が妥当であることが示された.
認識誤りを想定した音声対話システムの構築

新谷秀和, 西田昌史, 堀内靖雄, 黒岩眞吾

電気学会電子・情報・システム部門大会講演論文集(CD-ROM) 2009 ROMBUNNO.MC4-2 2009年9月3日
全盲者のウェブサイトのユーザビリティと検索効率に関する考察

飯塚潤一, 岡本明, 堀内靖雄, 市川熹

情報科学技術フォーラム講演論文集 8(3) 561-562 2009年8月20日
曲中のブレスによる合図を利用した伴奏システム

大三川晴香, 堀内靖雄, 西田昌史, 黒岩眞吾

情報処理学会研究報告(CD-ROM) 2009(2) ROMBUNNO.MUS-NO.81(26) 2009年8月15日
曲中のブレスによる合図を利用した伴奏システム

大三川晴香, 堀内靖雄, 西田昌史, 黒岩眞吾

研究報告音楽情報科学（MUS） 2009(26) 1-6 2009年7月22日

我々は独奏者のブレスによる合図を伴奏制御のインタフェースとして利用可能な伴奏システムを開発してきており，以前の研究では曲の冒頭部においてブレスの合図を利用できるシステムを提案した．本研究では曲の冒頭だけでなく，曲中でもブレスによる合図を利用可能な手法を提案する．システムを実装し，人間の演奏者による評価実験を行った結果，ブレスによる合図を用いた方がずれが減少し、演奏者による主観評価も高いことが示された．We are developing the accompaniment system using musical cues by the soloist's breath. In our previous study, we introduced the method of using breath cues at the beginning of musical piece. In this study, we introduced the method using breath cues not only at the beginning but also during a piece and performed the evaluation experiment by human soloists. As a result, it was suggested that the new system achieved better synchronization between the soloist and the system than the previous system and the performers who used the system preferred the new system better than the previous system.
日本手話対話の話者交替時の重複現象の分析

斉藤涼子, 堀内靖雄, 西田昌史, 黒岩眞吾

ヒューマンインタフェース学会研究報告集 : human interface 11(2) 195-200 2009年5月14日
日本手話対話の話者交替時の重複現象の分析

斉藤涼子, 堀内靖雄, 西田昌史, 黒岩眞吾

電子情報通信学会技術研究報告 109(29(WIT2009 1-47)) 195-200 2009年5月7日

In this research, we analyzed the overlap phenomena at turn-taking points in Japanese Sign Language Dialogue. The spontaneous dialogue data were recorded in the environment where they can look at each other via prompters and three dialogue data by six native signers were used for the analysis. First, it was shown that the overlaps at turn-taking point occurred with very high frequency (75%). Secondly, we analyzed these phenomena based on "turn-taking system for conversation" by H. Sacks, E.A. Schegloff and G. Jefferson and found the situations where the speaker (signer) continued his/her utterance after TRP (transition-relevance place) and the next speaker started his/her turn by recognizing or projecting the TRP, therefore the overlap occurred. We consider these types of overlap as the normal turn-taking. Finally, there were a few case (18%) where the turn-taking rule was broken and the other cases follow the rule.
日本手話対話の話者交替時の重複現象の分析

斉藤涼子, 堀内靖雄, 西田昌史, 黒岩眞吾

電子情報通信学会技術研究報告. WIT, 福祉情報工学 109(29) 195-200 2009年5月7日

In this research, we analyzed the overlap phenomena at turn-taking points in Japanese Sign Language Dialogue. The spontaneous dialogue data were recorded in the environment where they can look at each other via prompters and three dialogue data by six native signers were used for the analysis. First, it was shown that the overlaps at turn-taking point occurred with very high frequency (75%). Secondly, we analyzed these phenomena based on "turn-taking system for conversation" by H. Sacks, E.A. Schegloff and G. Jefferson and found the situations where the speaker (signer) continued his/her utterance after TRP (transition-relevance place) and the next speaker started his/her turn by recognizing or projecting the TRP, therefore the overlap occurred. We consider these types of overlap as the normal turn-taking. Finally, there were a few case (18%) where the turn-taking rule was broken and the other cases follow the rule.
日本手話対話の話者交替時の重複現象の分析

斉藤涼子, 堀内靖雄, 西田昌史, 黒岩眞吾

電子情報通信学会技術研究報告. HCS, ヒューマンコミュニケーション基礎 109(27) 195-200 2009年5月7日

In this research, we analyzed the overlap phenomena at turn-taking points in Japanese Sign Language Dialogue. The spontaneous dialogue data were recorded in the environment where they can look at each other via prompters and three dialogue data by six native signers were used for the analysis. First, it was shown that the overlaps at turn-taking point occurred with very high frequency (75%). Secondly, we analyzed these phenomena based on "turn-taking system for conversation" by H. Sacks, E.A. Schegloff and G. Jefferson and found the situations where the speaker (signer) continued his/her utterance after TRP (transition-relevance place) and the next speaker started his/her turn by recognizing or projecting the TRP, therefore the overlap occurred. We consider these types of overlap as the normal turn-taking. Finally, there were a few case (18%) where the turn-taking rule was broken and the other cases follow the rule.
ブレスによる合図を検出する伴奏システム

堀内靖雄, 西田昌史, 市川熹

情報処理学会論文誌 50(3) 1079-1089 2009年3月15日

従来の伴奏システムでは人間の独奏者と同時に演奏を開始しなければならない楽曲への対応は困難であり，人間の独奏者にとって伴奏システムを使いにくくしていた．本研究では実際の人間同士の合奏でブレスによる合図が重要な役割を演じていることに着目し，伴奏システムが人間の独奏者のブレスによる合図を検出し，人間の独奏者と同時に演奏を開始できる伴奏システムを実現するため，(1) ブレスによる合図と演奏開始のタイミングの分析，(2) 独奏者のブレスによる合図の検出手法の提案，(3) システムへの実装と評価を行った．ブレスによる合図に対するシステムの反応と演奏者の演奏のずれに関して，人間の演奏者が許容できる時間範囲を調べた結果，-60ミリ秒&sim;97ミリ秒程度が許容範囲であることが示された．この結果に基づいて，本提案システムの性能を評価したところ，240データに対し，94.6%が上述の人間の演奏者の許容臨界値内に含まれると考えられ，十分な精度でシステムがブレスの合図により演奏を開始できることが示された．For previous computer accompaniment system, it was difficult to synchronize with the human soloist at the begining of some musical pieces where the machine has to begin the accompaniment performance simultaneously with a human soloist. In the actual ensemble by human performers, they use breath as musical cues in general. In this study, in order to develop the accompaniment system using cue by breath, (1) we analyze the timing relation between breath as musical cues and the onset of the first note of the soloist, (2) we introduced the automatic detection method of breath timing, and (3) we implemented the accompaniment system using cue by breath and evaluate the system by human performers. As a result, it was suggested that the tolerated range about timing difference of synchronization between the soloist and the system were from -60 msec in system delay to 97 msec in haste and the system achieved the synchronization in the above range for the 94.6% of 240 experimental data.
ブレスによる合図を検出する伴奏システム

堀内靖雄, 西田昌史, 市川熹

情報処理学会論文誌ジャーナル(CD-ROM) 50(3) 1079-1089 2009年3月15日

従来の伴奏システムでは人間の独奏者と同時に演奏を開始しなければならない楽曲への対応は困難であり，人間の独奏者にとって伴奏システムを使いにくくしていた．本研究では実際の人間同士の合奏でブレスによる合図が重要な役割を演じていることに着目し，伴奏システムが人間の独奏者のブレスによる合図を検出し，人間の独奏者と同時に演奏を開始できる伴奏システムを実現するため，(1) ブレスによる合図と演奏開始のタイミングの分析，(2) 独奏者のブレスによる合図の検出手法の提案，(3) システムへの実装と評価を行った．ブレスによる合図に対するシステムの反応と演奏者の演奏のずれに関して，人間の演奏者が許容できる時間範囲を調べた結果，-60ミリ秒&sim;97ミリ秒程度が許容範囲であることが示された．この結果に基づいて，本提案システムの性能を評価したところ，240データに対し，94.6%が上述の人間の演奏者の許容臨界値内に含まれると考えられ，十分な精度でシステムがブレスの合図により演奏を開始できることが示された．For previous computer accompaniment system, it was difficult to synchronize with the human soloist at the begining of some musical pieces where the machine has to begin the accompaniment performance simultaneously with a human soloist. In the actual ensemble by human performers, they use breath as musical cues in general. In this study, in order to develop the accompaniment system using cue by breath, (1) we analyze the timing relation between breath as musical cues and the onset of the first note of the soloist, (2) we introduced the automatic detection method of breath timing, and (3) we implemented the accompaniment system using cue by breath and evaluate the system by human performers. As a result, it was suggested that the tolerated range about timing difference of synchronization between the soloist and the system were from -60 msec in system delay to 97 msec in haste and the system achieved the synchronization in the above range for the 94.6% of 240 experimental data.
特集「音楽情報処理」の編集にあたって

堀内靖雄

情報処理学会論文誌 50(3) 1053-1053 2009年3月15日
音韻変動を抑制する特徴変換に基づく話者認識

LU Haoze, 西田昌史, 堀内靖雄, 黒岩眞吾

日本音響学会研究発表会講演論文集(CD-ROM) 2009 ROMBUNNO.3-Q-1 2009年3月10日
ブレスによる合図を検出する伴奏システム (特集音楽情報処理)

堀内靖雄, 西田昌史, 市川熹

情報処理学会論文誌論文誌ジャーナル 50(3) 1079-1089 2009年3月
書き起こしへの付与を目指した音声とテキストを対象とした発話印象の分析

西田昌史, 堀内靖雄, 黒岩眞吾, 市川熹

情報処理学会論文誌 50(2) 460-468 2009年2月15日

近年，音声から書き起こしを自動的に作成するシステムに関する研究がさかんに行われている．これまでは，音声を正確に書き起こすことに重点をおいて研究されてきているが，見た者にとって議論の内容をより理解しやすい書き起こしの作成が重要であると考えられる．議論の内容を正確に伝えるには言語情報だけでは不十分であり，議論の場面や発話意図，感情といった情報も必要であると考えられる．そこで，本研究では会議や討論などの書き起こしに発話意図を付与することを目指し，テキストと音声の両方から発話印象について分析することを目的とした．まず，文字の太さや大きさの変化といった文字の装飾や，「！」，「？」などの記号に着目し，そのようなテキストの変化を書き起こしに付与する形で主観評価実験を行うことにより「疑問」，「驚き」などの発話印象がどの程度感じられるのかを調べた．また，音声についても同様に主観評価実験を行い，その結果と「F0」や「パワー」などの韻律パラメータを使って重回帰分析を行い，韻律パラメータと発話印象の関係を分析した．その結果，各テキスト変化，韻律パラメータとそれぞれの発話印象との関係が明らかになった．さらにそれらを総合的に分析することで，テキストと音声では発話印象の受け方が異なるものと，同じ傾向のものがあることが明らかになった．In recent years, a great amount of research has been done on systems that transcribe utterances through automatic speech recognition. This research has generally been focused on transcribing utterances correctly. What is presently required, however, is a transcription method that enables the overall content of a given discourse to be more easily understood by readers. It is generally considered that linguistic information by itself is insufficient for this purpose, and that a way of showing speaker's intentions and emotions is also required. In this study, we analyzed user's impressions of utterances from both text and speech, with the aim of at indexing the impressions to the transcriptions of discourse forums such as meetings and discussions. We investigated how impressions such as &ldquo;doubt&rdquo; and &ldquo;surprise&rdquo; are felt by changing the size of written characters and indexing signs such as question marks and exclamation marks in the text. The relation between prosody parameters and utterance impressions was analyzed by using multiple linear regression. As a result, we were able to clarify the relationship between variations of text, prosody parameters, and utterance impressions.
書き起こしへの付与を目指した音声とテキストを対象とした発話印象の分析

西田昌史, 堀内靖雄, 黒岩眞吾, 市川熹

情報処理学会論文誌ジャーナル(CD-ROM) 50(2) 460-468 2009年2月15日

近年，音声から書き起こしを自動的に作成するシステムに関する研究がさかんに行われている．これまでは，音声を正確に書き起こすことに重点をおいて研究されてきているが，見た者にとって議論の内容をより理解しやすい書き起こしの作成が重要であると考えられる．議論の内容を正確に伝えるには言語情報だけでは不十分であり，議論の場面や発話意図，感情といった情報も必要であると考えられる．そこで，本研究では会議や討論などの書き起こしに発話意図を付与することを目指し，テキストと音声の両方から発話印象について分析することを目的とした．まず，文字の太さや大きさの変化といった文字の装飾や，「！」，「？」などの記号に着目し，そのようなテキストの変化を書き起こしに付与する形で主観評価実験を行うことにより「疑問」，「驚き」などの発話印象がどの程度感じられるのかを調べた．また，音声についても同様に主観評価実験を行い，その結果と「F0」や「パワー」などの韻律パラメータを使って重回帰分析を行い，韻律パラメータと発話印象の関係を分析した．その結果，各テキスト変化，韻律パラメータとそれぞれの発話印象との関係が明らかになった．さらにそれらを総合的に分析することで，テキストと音声では発話印象の受け方が異なるものと，同じ傾向のものがあることが明らかになった．In recent years, a great amount of research has been done on systems that transcribe utterances through automatic speech recognition. This research has generally been focused on transcribing utterances correctly. What is presently required, however, is a transcription method that enables the overall content of a given discourse to be more easily understood by readers. It is generally considered that linguistic information by itself is insufficient for this purpose, and that a way of showing speaker's intentions and emotions is also required. In this study, we analyzed user's impressions of utterances from both text and speech, with the aim of at indexing the impressions to the transcriptions of discourse forums such as meetings and discussions. We investigated how impressions such as &ldquo;doubt&rdquo; and &ldquo;surprise&rdquo; are felt by changing the size of written characters and indexing signs such as question marks and exclamation marks in the text. The relation between prosody parameters and utterance impressions was analyzed by using multiple linear regression. As a result, we were able to clarify the relationship between variations of text, prosody parameters, and utterance impressions.
書き起こしへの付与を目指した音声とテキストを対象とした発話印象の分析 (特集音声ドキュメント処理)

西田昌史, 堀内靖雄, 黒岩眞吾

情報処理学会論文誌論文誌ジャーナル 50(2) 460-468 2009年2月
Considerations of Efficiency and Mental Stress of Search Tasks on Websites by Blind Persons.

Junichi Iizuka, Akira Okamoto, Yasuo Horiuchi, Akira Ichikawa

UNIVERSAL ACCESS IN HUMAN-COMPUTER INTERACTION: APPLICATIONS AND SERVICES, PT III 5616 693-700 2009年

We examined what kind of rating index was usable for verification of usability of websites for blind persons. The search time had a strong correlation with the NASA-TLX WWL scores. This would suggest a possibility to evaluate the usability by the search time. On the other hand, in respect of the accessibility check tool, its verification result had no correlation with NASA-TLX WWL scores, so it could not be used as a tool for verification of usability. We Must develop a new usability check tool for blind persons. If we place functions with high usage frequency and high level of importance at a top of the website where a user can easily recognize them, it not only gives us a high level of visibility but also is effective for a blind person using voice output web browser as well.
Text-independent speaker verification using rank threshold in large number of speaker models.

Haruka Okamoto, Satoru Tsuge, Amira Abdelwahab, Masafumi Nishida, Yasuo Horiuchi, Shingo Kuroiwa

INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5 2367-2370 2009年

In this paper, we propose a novel speaker verification method which determines whether a claimer is accepted or rejected by the rank of the claimer in a large number of speaker models instead of score normalization, such as T-norm and Z-norm. The method has advantages over the standard T-norm in speaker verification accuracy. However, it needs much computation time as well as T-norm that needs calculating likelihoods for many cohort models. Hence, we also discuss the speed-up using the method that selects cohort subset for each target speaker in the training stage. This data driven approach can significantly reduce computation resulting in faster speaker verification decision. We conducted text-independent speaker verification experiments using large-scale Japanese speaker recognition evaluation corpus constructed by National Research Institute of Police Science. As a result, the proposed method achieved an equal error rate of 2.2 %, while T-norm obtained 2.7 %.
Analysis of hand movement variation related to speed in Japanese sign language.

Yuta Yasugahira, Yasuo Horiuchi, Shingo Kuroiwa

ACM International Conference Proceeding Series 331-334 2009年

To achieve the greater accessibility for deaf people, sign language recognition systems and sign language animation systems must be developed. In Japanese sign language (JSL), previous studies have suggested that emphasis and emotion cause changes in hand movements. However, the relationship between emphasis and emotion and the signing speed has not been researched enough. In this study, we analyzed the hand movement variation in relation to the signing speed. First, we recorded 20 signed sentences at three speeds (fast, normal, and slow) using a digital video recorder and a 3D position sensor. Second, we segmented sentences into three types of components (sign words, transitions, and pauses). In our previous study, we analyzed hand movement variations of sign words in relation to the signing speed. In this study, we analyzed transitions between adjacent sign words by a method similar to that in the previous study. As a result, sign words and transitions showed a similar tendency, and we found that the variation in signing speed mainly caused changes in the distance hands moved. Furthermore, we compared transitions with sign words and found that transitions were slower than sign words. Copyright 2009 ACM.
Collaborative filtering based on an iterative prediction method to alleviate the sparsity problem.

Amira Abdelwahab, Hiroo Sekiya, Ikuo Matsuba, Yasuo Horiuchi, Shingo Kuroiwa

iiWAS2009 - The 11th International Conference on Information Integration and Web-based Applications and Services 375-379 2009年

Collaborative filtering (CF) is one of the most popular recommender system technologies. It tries to identify users that have relevant interests and preferences by calculating similarities among user profiles. The idea behind this method is that, it may be of benefit to one's search for information to consult the preferences of other users who share the same or relevant interests and whose opinion can be trusted. However, the applicability of CF is limited due to the sparsity and cold-start problems. The sparsity problem occurs when available data are insufficient for identifying similar users (neighbors) and it is a major issue that limits the quality of recommendations and the applicability of CF in general. Additionally, the cold-start problem occurs when dealing with new users and new or updated items in web environments. Therefore, we propose an efficient iterative prediction technique to convert user-item sparse matrix to dense one and overcome the cold-start problem. Our experiments with MovieLens and book-crossing data sets indicate substantial and consistent improvements in recommendations accuracy compared with item-based collaborative filtering, singular value decomposition (SVD)-based collaborative filtering and semi explicit rating collaborative filtering. © 2010 ACM.
An Efficient Collaborative Filtering Algorithm using SVD-free Latent Semantic Indexing and Particle Swarm Optimization

Amira Abdelwahab, Hiroo Sekiya, Ikuo Matsuba, Yasuo Horiuchi, Shingo Kuroiwa, Masafumi Nishida

IEEE NLP-KE 2009: PROCEEDINGS OF INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING 220-+ 2009年

The amount of accessible information in the Internet increases every day and it becomes greatly difficult to deal with such a huge source of information. Consequently, Recommender Systems (RS) which are considered as powerful tools for Information Retrieval (IR), can access these available information efficiently. Unfortunately, the recommendations accuracy is seriously affected by the problems of data sparsity and scalability. Additionally, the time of recommendations is very essential in the Recommender Systems. Therefore, we propose a proficient dimensionality reduction-based Collaborative Filtering (CF) Recommender System. In this technique, the Singular Value Decomposition-free (SVD-free) Latent Semantic Indexing (LSI) is utilized to obtain a reduced data representation solving the sparsity and scalability limitations. Also, the SVD-free extremely reduce the time and memory usage required for dimensionality reduction employing the partial symmetric Eigenproblem. Moreover, to estimate the optimal number of reduced dimensions which greatly influences the system accuracy, the Particle Swarm Optimization (PSO) algorithm is utilized to automatically obtain it. As a result, the proposed technique enormously increases the recommendations prediction quality and speed. In additions, it decreases the memory requirements. To show the efficiency of the proposed technique, we employed it to the MovieLens dataset and the results was very promising.
位置と動きに基づくマルチストリームHMMを用いた手話認識

西田昌史, 前畠大, 鈴木いおり, 堀内靖雄, 黒岩眞吾

電気学会論文誌 C 129(10) 1902-1907 2009年

To establish a universal communication environment, computer systems should recognize various modal communication languages. In conventional sign language recognition, recognition is performed by the word unit using gesture information of hand shape and movement. In the conventional studies, each feature has same weight to calculate the probability for the recognition. We think hand position is very important for sign language recognition, since the implication of word differs according to hand position. In this study, we propose a sign language recognition method by using a multi-stream HMM technique to show the importance of position and movement information for the sign language recognition. We conducted recognition experiments using 28,200 sign language word data. As a result, 82.1 % recognition accuracy was obtained with the appropriate weight (position:movement=0.2:0.8), while 77.8 % was obtained with the same weight. As a result, we demonstrated that it is necessary to put weight on movement than position in sign language recognition. © 2009 The Institute of Electrical Engineers of Japan.
位置と動きに基づくマルチストリームHMMを用いた手話認識

西田昌史, 前畠大, 鈴木いおり, 堀内靖雄, 黒岩眞吾

電気学会論文誌. C, 電子・情報・システム部門誌 = The transactions of the Institute of Electrical Engineers of Japan. C, A publication of Electronics, Information and System Society 129(10) 1902-1907 2009年

To establish a universal communication environment, computer systems should recognize various modal communication languages. In conventional sign language recognition, recognition is performed by the word unit using gesture information of hand shape and movement. In the conventional studies, each feature has same weight to calculate the probability for the recognition. We think hand position is very important for sign language recognition, since the implication of word differs according to hand position. In this study, we propose a sign language recognition method by using a multi-stream HMM technique to show the importance of position and movement information for the sign language recognition. We conducted recognition experiments using 28,200 sign language word data. As a result, 82.1 % recognition accuracy was obtained with the appropriate weight (position:movement=0.2:0.8), while 77.8 % was obtained with the same weight. As a result, we demonstrated that it is necessary to put weight on movement than position in sign language recognition.
バリアフリー社会に向けた音声情報処理 5.音声の知見の情報福祉への応用

中園薫, 長嶋祐二, 堀内靖雄

電子情報通信学会誌 91(12) 1036-1041 2008年12月1日

聴覚障害者に対する技術的支援の状況をみると,まだ十分とはいえない.音声研究で得られた知見を応用することにより,情報福祉技術の研究が効率的に進み,充実した支援の実現が期待される.本稿では,主に聴覚障害者が使用する手話と,視覚と聴覚の両方に障害がある盲ろう者が使用する指点字を取り上げる.まず,手話と音声の認知の仕組みに対する神経生理学的アプローチによる研究例を紹介する.続いて,音声の評価技術を手引きとして,テレビ電話を利用した手話通話を実現するために必要な手話映像品質の評価方法と,テレビ電話では不可避な映像遅延の影響について述べる.また,指点字にも,音声と同様に,速度や間,強弱などの情報(プロソディ)があることを示す.
音声の知見の情報福祉への応用

中園薫, 長嶋祐二, 堀内靖雄

電子情報通信学会誌 = The journal of the Institute of Electronics, Information and Communication Engineers 91(12) 1036-1041 2008年12月1日

聴覚障害者に対する技術的支援の状況をみると,まだ十分とはいえない.音声研究で得られた知見を応用することにより,情報福祉技術の研究が効率的に進み,充実した支援の実現が期待される.本稿では,主に聴覚障害者が使用する手話と,視覚と聴覚の両方に障害がある盲ろう者が使用する指点字を取り上げる.まず,手話と音声の認知の仕組みに対する神経生理学的アプローチによる研究例を紹介する.続いて,音声の評価技術を手引きとして,テレビ電話を利用した手話通話を実現するために必要な手話映像品質の評価方法と,テレビ電話では不可避な映像遅延の影響について述べる.また,指点字にも,音声と同様に,速度や間,強弱などの情報(プロソディ)があることを示す.
認識単位の異なる認識器を併用した認識結果の信頼度推定

真柄皓介, 西田昌史, 堀内靖雄, 黒岩眞吾

日本音響学会研究発表会講演論文集(CD-ROM) 2008 ROMBUNNO.3-Q-26 2008年9月3日
Web ページ間の関係に着目した大規模サイトの構造化の調査 : 音声ブラウザ利用におけるアクセシビリティの検討

大瀧万希子, 堀内靖雄, 西田昌史, 黒岩眞吾

電子情報通信学会技術研究報告. WIT, 福祉情報工学 108(170) 39-44 2008年7月20日

大量の情報を有する大規模なWebサイトでは目的の情報へのアクセス保証・効率的な情報探索の観点からアクセシビリティ・ユーザビリティに配慮した情報の構造化が必要である.本研究では大規模Webサイトにおいて情報がどのように構造化されているのかについて,既存のWebサイトを木構造による構造化の観点から分析した.その結果,情報のほとんどは5階層程度までに存在していることが確認されたが上位2階層ではリンク数が非常に多くアクセシビリティの観点からは問題があることがわかった.また,木構造から逸脱するリンク(冗長リンク)が大量に存在することも明らかとなった.それらにはナビゲーションを補助するものや複数の探索経路を示すものがあり,それらを視覚障碍者が音声ブラウザで理解しやすくするための提案を行った.
手の位置と動きに着目したHMMによる手話単語の認識 (データ工学)

前畠大, 西田昌史, 堀内靖雄, 黒岩眞吾

電子情報通信学会技術研究報告 108(93) 7-12 2008年6月19日

手話においては,手の位置によって単語の意味合いが異なることから,本研究では手の位置と動きに着目して,これらを統合した手話単語の認識手法について検討を行った.手の位置は発話単位で正規化を行い,位置座標のフレーム差分を動きとして特徴量に加えてHMMにより単語ごとにモデル化を行った.また,単語によって手の位置と動きの重要度が異なると考え,それらを分析するためにマルチストリームHMMによる認識も行った.その結果,位置座標のみでは67.1%の認識精度に対して動きを加えることで79.9%の認識精度が得られた.さらに,マルチストリームHMMによる重みを変動させたところ,位置よりも動きの重要性が大きいことが明らかになった.
手の位置と動きに着目したHMMによる手話単語の認識 (パターン認識・メディア理解)

前畠大, 西田昌史, 堀内靖雄, 黒岩眞吾

電子情報通信学会技術研究報告 108(94) 7-12 2008年6月19日

手話においては,手の位置によって単語の意味合いが異なることから,本研究では手の位置と動きに着目して,これらを統合した手話単語の認識手法について検討を行った.手の位置は発話単位で正規化を行い,位置座標のフレーム差分を動きとして特徴量に加えてHMMにより単語ごとにモデル化を行った.また,単語によって手の位置と動きの重要度が異なると考え,それらを分析するためにマルチストリームHMMによる認識も行った.その結果,位置座標のみでは67.1%の認識精度に対して動きを加えることで79.9%の認識精度が得られた.さらに,マルチストリームHMMによる重みを変動させたところ,位置よりも動きの重要性が大きいことが明らかになった.
手の位置と動きに着目したHMMによる手話単語の認識

前畠大, 西田昌史, 堀内靖雄, 黒岩眞吾

電子情報通信学会技術研究報告 108(93(DE2008 1-29)) 7-12 2008年6月12日
手の位置と動きに着目したHMMによる手話単語の認識

前畠大, 西田昌史, 堀内靖雄, 黒岩眞吾

電子情報通信学会技術研究報告. DE, データ工学 108(93) 7-12 2008年6月12日
精神作業負荷を考慮した音声対話戦略の分析

松尾典義, 西田昌史, 堀内靖雄, 市川熹

スバル技報 (35) 171-175 2008年6月
視覚障害者のウェブサイトの検索行動に関する考察

飯塚潤一, 岡本明, 堀内靖雄, 市川熹

電子情報通信学会技術研究報告. WIT, 福祉情報工学 108(67) 25-30 2008年5月29日

視覚障害者が音声ブラウザを使ってウェブサイトの情報を検索することは難しい。今回,視覚に障害のある実験者の操作手順を観察し,どのような意識・方略で検索を行っているのか考察した。その結果,視覚障害者固有の検索行動に起因した問題は少なく,ユーザビリティの観点での配慮が有効であることがわかった。
日本手話の表現速度の違いによる手動作変化の分析

安ヶ平雄太, 堀内靖雄, 西田昌史, 黒岩眞吾

電子情報通信学会技術研究報告. WIT, 福祉情報工学 108(67) 85-90 2008年5月29日

手話を用いた情報伝達システムの実現のため,手話CGアニメーション合成の研究・開発が行われている.その手法の一つであるモーションキャプチャを用いた手話CGアニメーションの合成の問題点として,単語の表現が収録時の表現に限られる点が挙げられる.本研究ではこの問題点解決のため,表現速度の違う手話文の腕の動作を分析し,表現速度の変化と腕の動作変化との関係を明らかにすることを目的とする.分析の結果,手話の表現速度を変化させる場合には主に軌跡の長さを変化させ,動作に制約がかかる場合や前後の動作の影響によっては,軌跡の長さはあまり変化せず腕の運動速度を変化させることが示唆された.
日本手話におけるうなずきと接続詞の分析

堀内靖雄, 亀崎紘子, 西田昌史, 黒岩眞吾, 市川熹

電子情報通信学会技術研究報告. WIT, 福祉情報工学 108(67) 91-96 2008年5月29日

本研究では日本手話対話における後続うなずきと接続詞の分析を行なった.結果として,日本手話の話し手の後続うなずきは,うなずき単独で「話題化」「順接」「条件」「ロールシフトを抜ける」という接続詞と類似した機能を果たしていることが示唆された.接続詞とうなずきの共起関係を分析したところ,前後を接続する接続詞が手指で単語として表現された場合,その単語と同期してうなずきが生じやすいが,否定的な単語に関してはうなずきが共起しないことが示された.
視覚障害者のウェブサイトの検索行動に関する考察

飯塚潤一, 岡本明, 堀内靖雄, 市川熹

電子情報通信学会技術研究報告. SP, 音声 108(66) 25-30 2008年5月29日

視覚障害者が音声ブラウザを使ってウェブサイトの情報を検索することは難しい。今回,視覚に障害のある実験者の操作手順を観察し,どのような意識・方略で検索を行っているのか考察した。その結果,視覚障害者固有の検索行動に起因した問題は少なく,ユーザビリティの観点での配慮が有効であることがわかった。
日本手話の表現速度の違いによる手動作変化の分析

安ヶ平雄太, 堀内靖雄, 西田昌史, 黒岩眞吾

電子情報通信学会技術研究報告. SP, 音声 108(66) 85-90 2008年5月29日

手話を用いた情報伝達システムの実現のため,手話CGアニメーション合成の研究・開発が行われている.その手法の一つであるモーションキャプチャを用いた手話CGアニメーションの合成の問題点として,単語の表現が収録時の表現に限られる点が挙げられる.本研究ではこの問題点解決のため,表現速度の違う手話文の腕の動作を分析し,表現速度の変化と腕の動作変化との関係を明らかにすることを目的とする.分析の結果,手話の表現速度を変化させる場合には主に軌跡の長さを変化させ,動作に制約がかかる場合や前後の動作の影響によっては,軌跡の長さはあまり変化せず腕の運動速度を変化させることが示唆された.
日本手話におけるうなずきと接続詞の分析

堀内靖雄, 亀崎紘子, 西田昌史, 黒岩眞吾, 市川熹

電子情報通信学会技術研究報告. SP, 音声 108(66) 91-96 2008年5月29日

本研究では日本手話対話における後続うなずきと接続詞の分析を行なった.結果として,日本手話の話し手の後続うなずきは,うなずき単独で「話題化」「順接」「条件」「ロールシフトを抜ける」という接続詞と類似した機能を果たしていることが示唆された.接続詞とうなずきの共起関係を分析したところ,前後を接続する接続詞が手指で単語として表現された場合,その単語と同期してうなずきが生じやすいが,否定的な単語に関してはうなずきが共起しないことが示された.
日本手話におけるうなずきと接続詞の分析

堀内靖雄, 亀崎紘子, 西田昌史, 黒岩眞吾, 市川熹

電子情報通信学会技術研究報告 108(67(WIT2008 1-19)) 91-96 2008年5月22日
日本手話の表現速度の違いによる手動作変化の分析

安ケ平雄太, 堀内靖雄, 西田昌史, 黒岩眞吾

電子情報通信学会技術研究報告 108(67(WIT2008 1-19)) 85-90 2008年5月22日
視覚障害者のウェブサイトの検索行動に関する考察

飯塚潤一, 岡本明, 堀内靖雄, 市川熹

電子情報通信学会技術研究報告 108(67(WIT2008 1-19)) 25-30 2008年5月22日

視覚障害者が音声ブラウザを使ってウェブサイトの情報を検索することは難しい。今回，視覚に障害のある実験者の操作手順を観察し，どのような意識・方略で検索を行っているのか考察した。その結果，視覚障害者固有の検索行動に起因した問題は少なく，ユーザビリティの観点での配慮が有効であることがわかった。It is difficult for the users with visual disability used a voice browser to search the information on the website. We carefully observed their search operation for information on the Web and considered what kind of consciousness/stratagem they searched it in. As a result, there were only a few problems caused by a search action peculiar to the visually impaired and understood that the improvement at the point of view of the usability was effective.
車載情報機器を対象とした音声対話における対システム発話の検出と識別

西田昌史, 神谷佐武郎, 堀内靖雄, 黒岩眞吾

日本音響学会研究発表会講演論文集(CD-ROM) 2008 ROMBUNNO.1-Q-30 2008年3月10日

所属学協会

Works(作品等)

もっとみる

共同研究・競争的資金等の研究課題

対話型自然言語の韻律に関する音声と手話の横断的分析

日本学術振興会科学研究費助成事業 2020年4月 - 2024年3月

堀内靖雄
多用途型日本手話言語データベース構築に関する研究

日本学術振興会科学研究費助成事業 2017年5月 - 2021年3月

長嶋祐二, 原大介, 堀内靖雄, 酒向慎司
作曲・演奏・信号の数理モデルに基づく音楽の生成と解析の研究

日本学術振興会科学研究費助成事業 2017年4月 - 2020年3月

嵯峨山茂樹, 北原鉄朗, 齋藤康之, 堀玄, 小野順貴, 中村和幸, 堀内靖雄, 齋藤大輔, 饗庭絵里子
言語聴覚士の会話技術の分析に基づく失語症者の単語思い出し支援手法

日本学術振興会科学研究費助成事業 2016年4月 - 2019年3月

黒岩眞吾, 堀内靖雄, 村西幸代, 古川大輔
モダリティが異なる対話型自然言語としての手話と音声の韻律機能の解明

日本学術振興会科学研究費助成事業 2015年4月 - 2019年3月

堀内靖雄

もっとみる

一覧へ戻る

堀内 靖雄

基本情報

研究分野

受賞

主要な論文

MISC

所属学協会

Works(作品等)

共同研究・競争的資金等の研究課題

堀内靖雄