堀内靖雄

ホリウチヤスオ (Yasuo Horiuchi)

基本情報

所属: 千葉大学大学院情報学研究院准教授

学位: 博士(工学)(1995年3月東京工業大学)

J-GLOBAL ID: 200901021029331583
researchmap会員ID: 1000191929

研究分野

情報通信 / 知能情報学 /

受賞

主要な論文

Determining the base frequency of the <i>F</i><sub>0</sub> contour generation model for the diverse expression of speech

Yoshiko Arimoto, Yasuo Horiuchi, Sumio Ohno

Acoustical Science and Technology 46(1) 2025年1月査読有り
「対話のことば」に共通な機能を形成する要因の考察

市川熹, 長嶋祐二, 堀内靖雄

日本音響学会誌 80(7) 355-366 2024年7月査読有り
Constructing a Highly Accurate Japanese Sign Language Motion Database Including Dialogue

Yuji Nagashima, Keiko Watanabe, Daisuke Hara, Yasuo Horiuchi, Shinji Sako, Akira Ichikawa

Communications in Computer and Information Science 76-81 2020年6月査読有り
Discussion of a Japanese sign language database and its annotation systems with consideration for its use in various areas

Shinji Sako, Yuji Nagashima, Daisuke Hara, Yasuo Horiuchi, Keiko Watanabe, Ritsuko Kikusawa, Naoto Kato, Akira Ichikawa

Proceeding of LingCologne 2019 2019年6月6日査読有り
Construction of a Japanese Sign Language Database with Various Data Types

Keiko Watanabe, Yuji Nagashima, Daisuke Hara, Yasuo Horiuchi, Shinji Sako, Akira Ichikawa

Communications in Computer and Information Science 317-322 2019年査読有り
Constructing a Japanese Sign Language Multi-Dimensional Database

•Yuji Nagashima, Daisuke Hara, Shinji Sako, Keiko Watanabe, Yasuo Horiuchi, Ritsuko Kikusawa, Naoto Kato, Akira Ichikawa

The 7th Meeting of Signed and SpokenLanguage Linguistics (SSLL 2018) 2018年9月28日査読有り
心的負担が軽い「対話のことば」の構造

市川熹, 堀内靖雄, 長嶋祐二

ヒューマンインタフェース学会論文誌 20(2) 191-204 2018年査読有り

We had shown experimental results on prosody of languages characterized by real-time dialogue such as speech, sign language, finger braille and so on. These results were discussed along with various research results both from inside and outside Japan. Based on the results, we examined a structure that enabled real-time dialogue with a light mental burden. Furthermore, we will propose a model which makes real-time dialogue possible by elucidating information structures of various languages characterized by real-time dialogue. The model to be proposed can explain various phenomena in real-time dialogue.

もっとみる

MISC

559

自然対話における発話者のうなずきに対する聞き手の反応 (テーマ:「e-Learningとインタラクティブ技術--音声言語処理・対話技術の教育への応用と展開」および一般)

前田真季子, 西田昌史, 堀内靖雄

言語・音声理解と対話処理研究会 39 35-42 2003年11月6日
遺伝的アルゴリズムによるF0モデルパラメータ推定手法と話者交替分析への適用

木村太郎, 西田昌史, 堀内靖雄, 市川熹

電子情報通信学会技術研究報告. SP, 音声 106(333) 37-42 2003年11月3日

F_0モデルのα,βを変数としてパラメータを推定する手法を提案する.α,βを変数としたF_0モデルのパラメータ推定は多パラメータを有した最適化問題であるが,A-b-S処理に遺伝的アルゴリズムを利用することによって効率的な最適化を行うことが可能である.提案手法は初期値を得るための処理と最適化処理の2段階の処理を共に遺伝的アルゴリズムによるA-b-Sを用いて行う.提案手法の精度をシミュレーションと実音声の実験によって検討した結果,良い精度が得られた.さらに対話データにおいて話者交替の有無とパラメータの関係を統計的に分析し,F_0モデルのフレーズ成分のパラメータαおよびA_pの影響を検討した.このとき,話者交替の起こる発話の場合にはαは大きな値をとりA_pは小さな値を取るという結果が得られた.
韻律のみによる文構造推定手法の検討

大須賀智子, 堀内靖雄, 市川あきら

日本音響学会研究発表会講演論文集 2003 229-230 2003年9月17日
アクセントフレーズを用いた音声セグメント境界の認知に関する予備的検討

畑野智栄, 堀内靖雄, 市川あきら

日本音響学会研究発表会講演論文集 2003 385-386 2003年9月17日
韻律からの文構造推定における局所的特徴の分析

大須賀智子, 堀内靖雄, 市川あきら

人工知能学会言語・音声理解と対話処理研究会資料 38th 01-06 2003年7月4日
韻律からの文構造推定における局所的特徴の分析 (テーマ:一般)

大須賀智子, 堀内靖雄, 市川熹

言語・音声理解と対話処理研究会 38 1-6 2003年7月4日
手話CGアニメーションにおける自然な手動作の生成

杉浦俊久, 市川熹, 堀内靖雄, 長嶋祐二

可視化情報学会誌. Suppl. 23(1) 343-346 2003年7月1日
韻律の局所的特徴による文の構造の推定

大須賀智子, 堀内靖雄, 市川熹

情報処理学会研究報告音声言語情報処理（SLP） 2003(58) 1-6 2003年5月27日

本研究では、音声の韻律情報のみを用いた文の構造の推定手法について検討した。推定に用いる韻律パラメータとして、今回は新たに、先行するアクセント句末1モーラにおける局所的な韻律的特徴を用いて文の木構造の生成を試みた。ATR503文を対象として実験を行った結果、部分木のレベルで約76?%の推定精度を得ることができた。これは従来の、後続音声区間にまたがる、より大局的なパラメータを用いた場合に対し、約4?%の低下にとどまり、ほぼ遜色のない結果が得られた。すなわち、先行する音声区間の局所的な韻律情報のみから、後続の音声区間への係り受け関係がある程度推定可能であることが確かめられた。この結果から、局所的特徴も文構造の理解へ貢献しており、韻律情報が我々人間の実時間および実環境での発話理解を支えるために、頑健な構造となっている可能性が示唆されたといえる。In this study, we introduce a method of estimating the syntactic tree structure of Japanese speech from the F0 contour and time duration. We formed the hypothesis that we can infer a syntactic relation with the following part by listening only to the leading part of speech, and we proposed an estimating method which uses only the local prosodic features of the final part of the leading phrase. We applied the method to the ATR 503 speech database. The experimental results indicated an estimation accuracy of 76\% for the branching judgment for each sequence of three leaves. We consider this result to be fairly good for the difficult task of estimating a syntactic structure that includes a future part by using only local prosodic features in the past, and also consider prosodic information to be very effective in real-time communication with speech.
韻律の局所的特徴による文の構造の推定

大須賀智子, 堀内靖雄, 市川あきら

情報処理学会研究報告 2003(57(NL-155)) 71-76 2003年5月26日

本研究では、音声の韻律情報のみを用いた文の構造の推定手法について検討した。推定に用いる韻律パラメータとして、今回は新たに、先行するアクセント句末1モーラにおける局所的な韻律的特徴を用いて文の木構造の生成を試みた。ATR503文を対象として実験を行った結果、部分木のレベルで約76?%の推定精度を得ることができた。これは従来の、後続音声区間にまたがる、より大局的なパラメータを用いた場合に対し、約4?%の低下にとどまり、ほぼ遜色のない結果が得られた。すなわち、先行する音声区間の局所的な韻律情報のみから、後続の音声区間への係り受け関係がある程度推定可能であることが確かめられた。この結果から、局所的特徴も文構造の理解へ貢献しており、韻律情報が我々人間の実時間および実環境での発話理解を支えるために、頑健な構造となっている可能性が示唆されたといえる。In this study, we introduce a method of estimating the syntactic tree structure of Japanese speech from the F0 contour and time duration. We formed the hypothesis that we can infer a syntactic relation with the following part by listening only to the leading part of speech, and we proposed an estimating method which uses only the local prosodic features of the final part of the leading phrase. We applied the method to the ATR 503 speech database. The experimental results indicated an estimation accuracy of 76\% for the branching judgment for each sequence of three leaves. We consider this result to be fairly good for the difficult task of estimating a syntactic structure that includes a future part by using only local prosodic features in the past, and also consider prosodic information to be very effective in real-time communication with speech.
Estimating Syntactic Structure from Prosody in Japanese Speech (共著)

IEICE Transactions on Information and Systems E86D(3) 558-564 2003年3月
複数のモデルを利用した重回帰分析による演奏表現の学習

福井浩司, 堀内靖雄, 市川あきら

情報処理学会研究報告 2003(16(MUS-49)) 13-18 2003年2月21日

本研究では伴奏システムが独奏者の演奏表現を利用した制御を可能とするため、独奏者の表情豊かな演奏を学習することにより、その独奏者の演奏表現を考慮した独奏演奏の予測を行なう手法を提案する。人間の演奏はさまざまな演奏表現がされる。伴奏システムは演奏表現を予測することでより良い適応が可能になると考えられる。本研究ではソロ演奏に焦点を絞り、全て単音の楽譜を用いて、人間の表情豊かな演奏を収録した。収録した人間の演奏履歴に対して複数のモデルで重回帰分析をおこない、楽譜上の各楽音で人間の演奏を予測するモデルを作成した。収録された演奏から得られたモデルの種類、予測誤差などについて検証し、伴奏システムの自動学習の可能性について検討を加えた。In this paper, to manage with musical expression in the accompaniment system, we will introduce an expecting method of expressive human performance. Humans perform several musical expressions.Therefore, better adaptation will be accomplished with such expected expression in the accompaniment system.Expressive human performances, where a performer played a monophonic piece, were recorded.These recorded performances were analyzed with regression analysis using plural models in order to generate performance expecting models at each note.We will discuss the possibility of auto learning of human performance by examining the type of model applied and the precision of estimation.
休符時における人間の合奏制御の分析

石毛大悟, 堀内靖雄, 市川あきら

情報処理学会研究報告 2003(16(MUS-49)) 7-12 2003年2月21日

本論文では、独奏に休符がある場合の人間の合奏制御のふるまいについて述べる。まず、休符がある分析用の楽譜を作成し、計算機の独奏と人間の伴奏者が合奏を行うデータを収録した。独奏に休符がある場合の伴奏者の合奏制御についていくつかの仮説を立て、重回帰分析によりモデル式を作成し、人間の演奏データとの誤差により評価を行い合奏制御の推定を行った。結果、ある時点において、その直前の独奏者との「ずれ」など差の情報を用いることができる部分では、その差の情報を利用しているが、相手が休符で差の情報を使うことができない部分では、1小節程度過去の演奏情報に従い演奏を行っていることが示唆された。This paper describes behavior human performance when there is some rests in his/her partner's score. 96 ensembles by acoputer and a human performance were recorded. Some hypotheses were formed sbout a model of human performance for synchronicity when a rest exisits in his/her pertner's part and multiple regression analysis is applied the recorded data. It is suggested that when the difference between two performance can be used, a human performer plays using the information for good synchronicity, but when the difference between two performers cannot be used of a rest, a human performer plays based on the information at about one bar line past.
人間同士の合奏データによる人間の演奏制御の分析

坂本圭司, 堀内靖雄, 市川あきら

情報処理学会研究報告 2003(16(MUS-49)) 1-6 2003年2月21日

本論文では、人間同士の合奏データを分析し、協調演奏時の人間の演奏制御モデルを推定することを試みる。人間同士の合奏を、独奏者2名・伴奏者2名で2グループ、全48回収録した。伴奏者の未来の演奏は、「両者の時間的ずれ」の履歴と「自分のテンポ変化量」の履歴から決定していると考え、収録データの重回帰分析により、提案モデルが従来モデルより精度が高いことが確認された。さらにモデルの係数より、人間は直前だけでなくより過去の情報を用いて演奏制御を行っていることが示唆された。また、伴奏モデルとは別に独奏モデルについても提案し、推定したモデルの妥当性を検証するため、合奏シミュレーションを行った。In this paper, we will estimate a model of human performance in ensemble. 48 performances played by 8 couples of pianists (4 soloists and 4 accompanists) were recorded. It is supposed that human performers decide their future performance in real time using two factors in the past. One is the history of"the time difference between the soloist and the accompanist,' and the other is the history of `the amount of the tempo modification'. The recorded data were analyzed by multiple regression analysis and it was confirmed that the proposal model has better precision than aconventional model. Furthermore, from the analysis of coefficients of the estimated model, it was suggested that human performers use not only the immediately preceding history but the long history. The simulations of ensemble played by two performer angents were performed in order to verify the availability of the estimated model and they achieved valid ensembles.
人間同士の合奏データによる人間の演奏制御の分析

坂本圭司, 堀内靖雄, 市川熹

情報処理学会研究報告. [音楽情報科学] 49(16) 1-6 2003年2月21日

本論文では、人間同士の合奏データを分析し、協調演奏時の人間の演奏制御モデルを推定することを試みる。人間同士の合奏を、独奏者2名・伴奏者2名で2グループ、全48回収録した。伴奏者の未来の演奏は、「両者の時間的ずれ」の履歴と「自分のテンポ変化量」の履歴から決定していると考え、収録データの重回帰分析により、提案モデルが従来モデルより精度が高いことが確認された。さらにモデルの係数より、人間は直前だけでなくより過去の情報を用いて演奏制御を行っていることが示唆された。また、伴奏モデルとは別に独奏モデルについても提案し、推定したモデルの妥当性を検証するため、合奏シミュレーションを行った。In this paper, we will estimate a model of human performance in ensemble. 48 performances played by 8 couples of pianists (4 soloists and 4 accompanists) were recorded. It is supposed that human performers decide their future performance in real time using two factors in the past. One is the history of"the time difference between the soloist and the accompanist,' and the other is the history of `the amount of the tempo modification'. The recorded data were analyzed by multiple regression analysis and it was confirmed that the proposal model has better precision than aconventional model. Furthermore, from the analysis of coefficients of the estimated model, it was suggested that human performers use not only the immediately preceding history but the long history. The simulations of ensemble played by two performer angents were performed in order to verify the availability of the estimated model and they achieved valid ensembles.
休符時における人間の合奏制御の分析

石毛大悟, 堀内靖雄, 市川熹

情報処理学会研究報告. [音楽情報科学] 49(16) 7-12 2003年2月21日

本論文では、独奏に休符がある場合の人間の合奏制御のふるまいについて述べる。まず、休符がある分析用の楽譜を作成し、計算機の独奏と人間の伴奏者が合奏を行うデータを収録した。独奏に休符がある場合の伴奏者の合奏制御についていくつかの仮説を立て、重回帰分析によりモデル式を作成し、人間の演奏データとの誤差により評価を行い合奏制御の推定を行った。結果、ある時点において、その直前の独奏者との「ずれ」など差の情報を用いることができる部分では、その差の情報を利用しているが、相手が休符で差の情報を使うことができない部分では、1小節程度過去の演奏情報に従い演奏を行っていることが示唆された。This paper describes behavior human performance when there is some rests in his/her partner's score. 96 ensembles by acoputer and a human performance were recorded. Some hypotheses were formed sbout a model of human performance for synchronicity when a rest exisits in his/her pertner's part and multiple regression analysis is applied the recorded data. It is suggested that when the difference between two performance can be used, a human performer plays using the information for good synchronicity, but when the difference between two performers cannot be used of a rest, a human performer plays based on the information at about one bar line past.
自然対話におけるジェスチャーの相互的関係の分析

前田真季子, 堀内靖雄, 市川あきら

情報処理学会研究報告 2003(9(HI-102)) 39-46 2003年1月30日

人は視線の動きやうなずきなどのジェスチャーを用いて、対話の円滑なやり取りを行なっている。自然対話は話者同士の音声情報、視覚情報を用いた相互作用によって進行していくものであるため、音声におけるあいづち現象などと同様に、ジェスチャー同士にも話者間に相互作用が生じていることが推測される。そこで、本論文では、特にうなずきに着目し、ジェスチャーによる相互作用を分析した。分析に用いたデータは、6組の親しい友人同士による対話であり、収録には正面映像を撮ることが可能な、２つのプロンプターを使用した。そして、その収録データを一般に公開されているアノテーションツール"ANVIL"を用いて、アノテートし、分析を行なった。分析の結果、うなずきは、あいづちと同様に相手話者の発話に対する何らかの応答動作として生じる場合よりも、自己発話内の方が多く生じる傾向が見られた。また、うなずきが二人の話者で同時に発生する現象が多いことも示唆された。People use gestures like gaze and nod for smooth communication in dialogue. Usual dialogue continues exchanging interlocutor's information with each other using speech and gestures and therefore it is supposed that there is correlation between interlocutors' gestures as backchannels in speech. In this paper, we focused nods for the analyses of gestures. 18 dialogues by six pairs of good friends were recorded, where they can look at each other via two prompters. The prompter can record the interlocutors' gesture on videotape and project the partner's image through a half mirror. We annotated recorded dialogue by using the annotation tool "ANVIL" developed by Michael Kipp and the transcription tool developed by ours. As a result, it was suggested that gestures are caused more frequently when an interlocutor is speaking than listening, and interlocutors tend to nod simultaneously with considerable frequency.
自然対話におけるジェスチャーの相互的関係の分析

前田真季子, 堀内靖雄, 市川熹

情報処理学会研究報告. HI, ヒューマンインタフェース研究会報告 102(9) 39-46 2003年1月30日

人は視線の動きやうなずきなどのジェスチャーを用いて、対話の円滑なやり取りを行なっている。自然対話は話者同士の音声情報、視覚情報を用いた相互作用によって進行していくものであるため、音声におけるあいづち現象などと同様に、ジェスチャー同士にも話者間に相互作用が生じていることが推測される。そこで、本論文では、特にうなずきに着目し、ジェスチャーによる相互作用を分析した。分析に用いたデータは、6組の親しい友人同士による対話であり、収録には正面映像を撮ることが可能な、2つのプロンプターを使用した。そして、その収録データを一般に公開されているアノテーションツール"ANVIL"を用いて、アノテートし、分析を行なった。分析の結果、うなずきは、あいづちと同様に相手話者の発話に対する何らかの応答動作として生じる場合よりも、自己発話内の方が多く生じる傾向が見られた。また、うなずきが二人の話者で同時に発生する現象が多いことも示唆された。
[特別講演]対話言語とコミュニケーション障害者支援

電子情報通信学会技術研究報告. WIT, 福祉情報工学 103(402) 21-28 2003年
How does human segment the speech by prosody ?

Toshie Hatano, Yasuo Horiuchi, Akira Ichikawa

EUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology 149-152 2003年1月1日

In this study, we introduced a new model of how a human understands speech in real time and performed a cognitive experiment to investigate the unit for processing and understanding speech. In the model, first humans segment the acoustical signal into some acoustical units, and then the mental lexicon is accessed and searched for the segmented units. For this segmentation, we believe that prosody information must be used. In order to investigate how humans segment acoustical speech using only prosody, we performed an experiment in which participants listened to a pair of segmented speech materials, where each material was divided from the same speech material where the two segmentation positions differed from each other, and judged which material sounded more natural. On the basis of the results of this experiment, it is suggested that humans tend to segment speech based on the accent rules of Japanese, and that the introduced model is supported.
Estimating syntactic structure from prosody in Japanese speech

Tomoko Ohsuga, Yasuo Horiuchi, Akira Ichikawa

IEICE Transactions on Information and Systems E86-D 558-564 2003年1月1日

In this study, we introduce a method for estimating the syntactic structure of Japanese speech from F0 contour and pause duration. We defined a prosodie unit (PU) which is divided by the local minimal point of an F0 contour or pause. Combining PUs repeatedly (a pair of PUs is combined into one PU), a tree structure is gradually generated. Which pair of PUs in a sequence of three PUs should be combined is decided by a discriminant function based on the discriminant analysis of a corpus of speech data. We applied the method to the ATR Phonetically Balanced Sentences read by four Japanese speakers. We found that with this method, the correct rate of judgement for each sequence of three PUs is 79% and the estimation accuracy of the entire syntactic structure for each sentence is 26%. We consider this result to demonstrate a good degree of accuracy for the difficult task of estimating syntactic structure only from prosody.
C206 手話CGアニメーションにおける自然な手動作の生成

杉浦俊久, 市川熹, 堀内靖雄, 長嶋祐二

可視化情報学会誌 23(1) 343-346 2003年

We aim at realization of a JSL dialog system. We have developed a JSL animation system at Kogakuin University. However, in the system, there was a problem that the position of elbows was unnatural. In this study, 3D motion tracking system was used in order to track the accurate position of signer's hands and elbows. The signer was attached electromagnetic sensors in the back of his/her hands and elbows. We applied the multiple linear regression analysis for deciding the position of elbows in JSL animation. We compared the proposed method with the past method. The resulting error of our method became smaller.
アクセントフレーズ境界の知覚に関する一考察

畑野智栄, 堀内靖雄, 市川あきら

電子情報通信学会技術研究報告 102(527(NLC2002 44-70)) 75-80 2002年12月19日
アクセントフレーズ境界の知覚に関する一考察

畑野智栄, 堀内靖雄, 市川熹

情報処理学会研究報告音声言語情報処理（SLP） 2002(121) 75-80 2002年12月16日

本研究では，日本語の認知処理仮説モデルを想定し，このモデルの初期の処理単位について実験的に検討した．本研究で仮説として提案するモデルでは，日本語音声に関して，初期の認知処理段階でプロソディ情報を用いて音声信号がセグメント化され，そのセグメントに対して辞書とのマッチングが行われる．提案モデルでは，このセグメントのための音声的なまとまりの単位が，従来言われているアクセントフレーズに相当する単位であるものと仮定し，音韻情報を用いなくとも，アクセントフレーズがまとまりとして知覚され得るかどうかを，実験的に検討した．実験の結果，アクセントフレーズの切れ目と認知レベルでの自然なセグメントの位置が概ね同じであることが示唆された．In this paper, we will introduce a hypothetical cognitive model of Japanese speech and try to examine the hypothetical model by a cognitive experiment. In the model, a cognitive unit is segmented from acoustic signal using only prosody and then the segmented unit is matched against the dictionary in the listener's brain. We performed a cognitive experiment where the subjects are asked to segment meaningless Japanese speech which is pronounced imitating meaningful speech. The experimental result suggested that segmented units are almost the same as accentual phrases. From the result, it is suggeted that the cognitive unit can be detected by only prosody and it could be the unit for matching against the dictionary. This observation is consistent with our hypothetical model.
アクセントフレーズ境界の知覚に関する一考察

畑野智栄, 堀内靖雄, 市川熹

電子情報通信学会技術研究報告. NLC, 言語理解とコミュニケーション 102(527) 75-80 2002年12月12日

本研究では,日本語の認知処理仮説モデルを想定し,このモデルの初期の処理単位について実験的に検討した.本研究で仮説として提案するモデルでは,日本語音声に関して,初期の認知処理段階でプロソディ情報を用いて音声信号がセグメントされ,そのセグメントに対して辞書とのマッチングが行われる.提案モデルでは,このセグメントのための音声的なまとまりの単位が,従来言われているアクセントフレーズに相当する単位であるものと仮定し,音韻情報を用いなくとも,アクセントフレーズがまとまりとして知覚され得るかどうかを,実験的に検討した.実験の結果,アクセントフレーズの切れ目と認知レベルでの自然なセグメントの位置が概ね同じであることが示唆された.
アクセントフレーズ境界の知覚に関する一考察

畑野智栄, 堀内靖雄, 市川熹

電子情報通信学会技術研究報告. SP, 音声 102(529) 75-80 2002年12月12日

本研究では,日本語の認知処理仮説モデルを想定し,このモデルの初期の処理単位について実験的に検討した.本研究で仮説として提案するモデルでは,日本語音声に関して,初期の認知処理段階でプロソディ情報を用いて音声信号がセグメントされ,そのセグメントに対して辞書とのマッチングが行われる.提案モデルでは,このセグメントのための音声的なまとまりの単位が,従来言われているアクセントフレーズに相当する単位であるものと仮定し,音韻情報を用いなくとも,アクセントフレーズがまとまりとして知覚され得るかどうかを,実験的に検討した.実験の結果,アクセントフレーズの切れ目と認知レベルでの自然なセグメントの位置が概ね同じであることが示唆された.
演奏するコンピュータ

堀内靖雄

システム/制御/情報 46(11) 671-676 2002年11月15日
音声対話システムにおける発話予測を利用した音声認識

玉井孝幸, 堀内靖雄, 市川あきら

情報処理学会研究報告 2002(98(SLP-43)) 1-6 2002年10月25日

本稿では、音声対話システムにおいて、次発話の予測情報を利用して自然発話の認識を行なう音声認識手法を提案する。本手法では、発話状態ごとに得られる次発話の予測文候補から、認識辞書と言語モデルを動的に生成し、それを用いて発話認識を行なう。このとき、自然発話の発声を許容するように認識辞書と言語モデルを生成する。さらに、大語彙音声認識用の言語モデルを用いた音声認識を並列に実行し、認識尤度を比較することによって、発話予測失敗の検出を可能とした。評価実験の結果、フィラー挿入文以外の自然発話に対して100％の認識率が得られ、フィラー挿入文に対しても97.4％という高い認識率が得られた。また、予測失敗時の検出率も96.2％という高い数値が得られ、本手法の有効性が示された。In this paper, we propose a method of speech recognition for spontaneous speech using prediction of the next user's utterance in a spoken dialogue system. A dictionary and a language model for speech recognition are generated dynamically based on a set of sentences which is predicted by the condition of the proceeding dialogue. The dictionary and the language model are modified so that the system can recognize spontaneous speech including inversion, ellipsis, fillers etc. Furthermore, the system can detect whether the prediction is correct by performing the usual speech recognition method in parallel and comparing the results by these two recognition methods. The result of the experiments shows the effectiveness of the recognition of predicted utterances and the detection of failures of prediction.
手話対話におけるうなずきの影響に関する実験的検討

土肥修, 長嶋祐二, 寺内美奈, 堀内靖雄, 市川あきら

電子情報通信学会技術研究報告 102(317(PRMU2002 70-79)) 45-50 2002年9月19日
オプティカルフローによる手話の大局的動作の認識について

岡沢裕二, 堀内靖雄, 市川あきら

電子情報通信学会技術研究報告 102(317(PRMU2002 70-79)) 39-44 2002年9月19日
オプティカルフローによる手話の大局的動作の認識について

岡澤裕二, 堀内靖雄, 市川熹

電子情報通信学会技術研究報告. WIT, 福祉情報工学 102(319) 39-44 2002年9月12日

動画像からの手の大局的動作の抽出と認識について報告する.日本手話は,様々な身体器官を用いて表現される複雑な視覚言語であり,実時間での対話が可能である.我々は大局的特徴が手話表現の理解を助けているという仮説を立て,大局的特徴のみを用いた手話文認識や,大局的特徴と状況情報を利用した手話文認識を行って,その有効性について検討してきた.本報告では,オプティカルフローを用いた動画像からの手動作の抽出と,それを元にした動作の認識について述べる.動画像から推定したオプティカルフローを元に,手動作の動きの方向を値としてもつ2次元パターンを作成し,2次元パターンから特徴量を抽出してDPマッチングを行い,91.3%の一位正解率を得た.
手話対話における頷きの影響に関する実験的検討

土肥修, 長嶋祐二, 寺内美奈, 堀内靖雄, 市川熹

電子情報通信学会技術研究報告. WIT, 福祉情報工学 102(319) 45-50 2002年9月12日

手話は視覚情報を使った対話型の言語である。話し手は聞き手に対して手指や表情などを用いて情報を伝えているが、それと同時に話し手も聞き手から様々な視覚情報を獲得している。話し手はその聞き手からの情報を受け取りながら手話のリズムを作っていると考えられる。そこで聞き手の情報が話し手にとってどのような効果をもっているかを検討すべく、聞き手側の頷きを制御し話し手の手話の変化を調べる実験を試みた。聞き手側の頷きとして、普通の頷き、頷きなし、1秒、3秒、4秒のリズムに合わせた頷きを設定した。その結果、聞き手の頷きの頻度の違いに応じて話し手の手話の速度や非手指信号の出現頻度、文構造などに変化が起こる傾向があるということがわかった。
オプティカルフローによる手話の大局的動作の認識について

岡澤裕二, 堀内靖雄, 市川熹

電子情報通信学会技術研究報告. PRMU, パターン認識・メディア理解 102(317) 39-44 2002年9月12日

動画像からの手の大局的動作の抽出と認識について報告する.日本手話は,様々な身体器官を用いて表現される複雑な視覚言語であり,実時間での対話が可能である.我々は大局的特徴が手話表現の理解を助けているという仮説を立て,大局的特徴のみを用いた手話文認識や,大局的特徴と状況情報を利用した手話文認識を行って,その有効性について検討してきた.本報告では,オプティカルフローを用いた動画像からの手動作の抽出と,それを元にした動作の認識について述べる.動画像から推定したオプティカルフローを元に,手動作の動きの方向を値としてもつ2次元パターンを作成し,2次元パターンから特徴量を抽出してDPマッチングを行い,91.3%の一位正解率を得た.
手話対話における頷きの影響に関する実験的検討

土肥修, 長嶋祐二, 寺内美奈, 堀内靖雄, 市川熹

電子情報通信学会技術研究報告. PRMU, パターン認識・メディア理解 102(317) 45-50 2002年9月12日

手話は視覚情報を使った対話型の言語である。話し手は聞き手に対して手指や表情などを用いて情報を伝えているが、それと同時に話し手も聞き手から様々な視覚情報を獲得している。話し手はその聞き手からの情報を受け取りながら手話のリズムを作っていると考えられる。そこで聞き手の情報が話し手にとってどのような効果をもっているかを検討すべく、聞き手側の頷きを制御し話し手の手話の変化を調べる実験を試みた。聞き手側の頷きとして、普通の頷き、頷きなし、1秒、3秒、4秒のリズムに合わせた頷きを設定した。その結果、聞き手の頷きの頻度の違いに応じて話し手の手話の速度や非手指信号の出現頻度、文構造などに変化が起こる傾向があるということがわかった。
指点字と手話における時間的なプロソディ

堀内靖雄, 市川あきら

電子情報通信学会技術研究報告 102(254(TL2002 10-23)) 83-89 2002年7月29日
日本手話における時間・空間構造からの文構造推定の検討

桑子浩明, 堀内靖雄, 市川あきら

電子情報通信学会技術研究報告 102(254(TL2002 10-23)) 77-82 2002年7月29日
日本手話における時間・空間構造からの文構造推定の検討

桑子浩明, 堀内靖雄, 市川熹

電子情報通信学会技術研究報告. TL, 思考と言語 102(254) 77-82 2002年7月22日

日本手話は音声と同様に対話型の自然言語である。話者同士の対話が円滑に行われるには、人にとって実時間処理可能な構造であることが予想される。日本手話は、音声と異なり空間を利用した視覚言語であることから、時間・空間を利用した対話がなされている。その時間・空間の構造が実時間理解を支援するような影響を文の構造に与えていることが考えられる。そこで、本研究では空間構造に着目し、文構造との関連性を検討した。実験結果からは、空間位置や手話単語間の距離には有効な関係は見出せなかった。
指点字と手話における時間的なプロソディ

堀内靖雄, 市川熹

電子情報通信学会技術研究報告. TL, 思考と言語 102(254) 83-89 2002年7月22日

対話型の自然言語では、リアルタイムに認識・理解するためにプロソディ情報が重要な役割を担っていると考えられる。本稿では、対話型自然言語として、日本語音声言語、盲聾者の用いる指点字、視覚障害者の用いる日本手話に関して、そのプロソディ情報について分析した結果とプロソディ情報による構造解析手法を紹介する。最後にこれら対話型自然言語におけるプロソディの役割について、考察を加える。
話者交替における視線とうなずきの分析

前田真季子, 堀内靖雄, 市川あきら

人工知能学会言語・音声理解と対話処理研究会資料 35th 53-58 2002年6月7日
基本周波数とポーズによる構文構造の推定

大須賀智子, 鈴木則夫, 堀内靖雄, 市川あきら

人工知能学会言語・音声理解と対話処理研究会資料 35th 41-46 2002年6月7日
基本周波数とポーズによる構文構造の推定 (テーマ:一般)

大須賀智子, 鈴木則夫, 堀内靖雄

言語・音声理解と対話処理研究会 35 41-46 2002年6月7日
話者交替における視線とうなずきの分析 (テーマ:一般)

前田真季子, 堀内靖雄, 市川熹

言語・音声理解と対話処理研究会 35 53-58 2002年6月7日
日本手話における首動作の解析

土肥修, 堀内靖雄, 市川あきら

電子情報通信学会技術研究報告 101(703(WIT2001 43-50)) 7-12 2002年3月8日
韻律情報の個人差の分析手法に関する予備的検討

大須賀智子, 堀内靖雄, 市川あきら

人工知能学会言語・音声理解と対話処理研究会資料 34th 63-68 2002年3月8日
韻律情報の個人差の分析手法に関する予備的検討 (テーマコーパスを利用した談話・対話研究)

大須賀智子, 堀内靖雄, 市川熹

言語・音声理解と対話処理研究会 34 63-68 2002年3月8日
盲ろう者のPC利用環境に関する検討

江波孝彦, 堀内靖雄, 市川あきら

電子情報通信学会技術研究報告 101(702(WIT2001 38-42)) 7-12 2002年3月7日
日本手話における首動作の解析

土肥修, 堀内靖雄, 市川熹

電子情報通信学会技術研究報告. WIT, 福祉情報工学 101(703) 7-12 2002年3月1日

日本手話においては、手指動作だけでなく瞬きや口の動きといった表情や、頷きや首振りなどの首の動き、姿勢などであらわされる非手指動作も意思疎通のために重要な役割を担っている。しかし、非手指動作の持つ意味や発現する基準などは未だ解明されていない。そこで本稿では非手指動作のうち首動作に限定し、それぞれの動作の意味とそれらの表出されるタイミングという面から解析を試みた。その結果首動作をいくつか分類することができ、その役割によって動きや表出するタイミングに違いが見られることがわかった。
盲ろう者のPC利用環境に関する検討

江波孝彦, 堀内靖雄, 市川熹

電子情報通信学会技術研究報告. WIT, 福祉情報工学 101(702) 7-12 2002年2月28日

盲ろう者のコミュニケーション手段は様々であるが,リアルタイムでのコミュニケーション手段として優れているのは指点字であると考えられる.しかし,実際に指点字を使いこなす盲ろう者の数はまだ少ない.これは,指点字を学習することが盲ろう者,介助者双方にとって大きな負担となるためである.そのために,指点字学習システムの実現が望まれている.その実現のための問題点の一つが,盲ろう者のための指点字学習システムには,盲ろう者が独りで扱えるシステムのインタフェースが要求されるということである.現在,盲ろう者がパソコンを利用する際の出力装置として,点字ディスプレイと指点字出力装置が挙げられる.実際に2人の盲ろう者のパソコン利用環境を調査し,また2つの装置の比較を行い,より適切な盲ろう者とPCとのインタフェースについて考察をおこなった.
音楽情報科学合奏における人間の発音時刻制御モデルの推定

堀内靖雄, 坂本圭司, 市川あきら

情報処理学会論文誌 43(2) 260-267 2002年2月15日
合奏における人間の発音時刻制御モデルの推定

堀内靖雄, 坂本圭司, 市川熹

情報処理学会論文誌 43(2) 260-267 2002年2月15日

本論文では，合奏における人間の発音時刻制御モデルを推定する．人間とコンピュータ（テンポ変化をあらかじめ任意に指定できる）による合奏データを収録，分析し，人間の演奏が過去のどのパラメータと相関があるのかを調べた．その結果，人間の演奏は合奏相手との直前の「ずれの変化量（テンポの逆数に相当）」と相関が高く，また，テンポ変化が発生した場合には「ずれ」と相関が高いことが分かった．そこで，これら2つのパラメータに対し，重回帰分析を行うことにより，人間の演奏制御モデルを推定した．推定モデルによる推定誤差を人間の持つ平均的な誤差（人間の演奏者がテンポ一定の相手と合奏する際の平均的な時間的ずれ）と比較した結果，この推定モデルは人間の誤差と比べても，十分な精度を持つことが示された．In this paper, we will estimate a model of human performance inensemble.Several performances, where performers play together with a computerwhich can play with an accurate tempo specified beforehand, wererecorded.The recorded performances were analyzed in order to examine whichparameters are correlated to next human's behavior (tempo changing).It was found that there were close correlations: (1) correlationbetween ``time lag between a computer and a human performer'' andthe change of duration played by the performer, especially afterchanges of tempi, (2) correlation between ``difference of durationsbetween a computer and a performer'' and the change of duration playedby the performer.From this observation, the model of human performance was estimatedusing multiple regression analysis of the recorded data in terms ofthese two parameters.The model was evaluated in comparison with human errors and it wasfound that there is no significant difference between errors by humanperformers and by the model.
演奏するコンピュータ

システム制御情報学会誌 46(11) 671-676 2002年

所属学協会

Works(作品等)

もっとみる

共同研究・競争的資金等の研究課題

対話型自然言語の韻律に関する音声と手話の横断的分析

日本学術振興会科学研究費助成事業 2020年4月 - 2024年3月

堀内靖雄
多用途型日本手話言語データベース構築に関する研究

日本学術振興会科学研究費助成事業 2017年5月 - 2021年3月

長嶋祐二, 原大介, 堀内靖雄, 酒向慎司
作曲・演奏・信号の数理モデルに基づく音楽の生成と解析の研究

日本学術振興会科学研究費助成事業 2017年4月 - 2020年3月

嵯峨山茂樹, 北原鉄朗, 齋藤康之, 堀玄, 小野順貴, 中村和幸, 堀内靖雄, 齋藤大輔, 饗庭絵里子
言語聴覚士の会話技術の分析に基づく失語症者の単語思い出し支援手法

日本学術振興会科学研究費助成事業 2016年4月 - 2019年3月

黒岩眞吾, 堀内靖雄, 村西幸代, 古川大輔
モダリティが異なる対話型自然言語としての手話と音声の韻律機能の解明

日本学術振興会科学研究費助成事業 2015年4月 - 2019年3月

堀内靖雄

もっとみる

一覧へ戻る

堀内 靖雄

基本情報

研究分野

受賞

主要な論文

MISC

所属学協会

Works(作品等)

共同研究・競争的資金等の研究課題

堀内靖雄