用例に基づく機械翻訳の一つである Super-Function (SF) に基づく機械翻訳は、名詞を変数化することで用例の適用範囲を広げられるという特長を持つが、名詞以外の表現を含む日付・時間表現では、日付・時間表現全体をひとつの名詞として抽出することができず、その数字部分しか変数化できないという問題があった。この問題を解決するため、本稿では、日付・時間表現を抽出する手法を提案する。SF に基づく機械翻訳では名詞を抽出するために名詞判定規則を用いている。また抽出した各名詞の言語間の対応を得るために単語辞書を用いている。本手法ではまず名詞判定規則に日付・時間表現を抽出する規則を追加し日付・時間表現を抽出した。次に抽出した日付・時間表現を日英に共通な形に変換することで日付・時間表現の対応を得た。作成した規則を用いて評価実験を行ったところ日本文で適合率 96.7%、再現率 98.2%、英文で適合率 94.7%、再現率 92.7%を得られた。Super-Function Based Machine Translation(SFBMT) which is a type of Example-Based Machine Translation has a feature which makes it possible to expand the coverage of examples by changing nouns into variables, however, there were problems extracting entire date/time expressions containing parts-of-speech other than nouns, because only nouns/numbers were changed into variables. We describe a method for extracting date/time expressions for SFBMT. SFBMT uses noun determination rules to extract nouns and a bilingual dictionary to obtain correspondence of the extracted nouns between the source and the target languages. In this method, we add a rule to extract date/time expressions and then extract date/time expressions from a Japanese-English bilingual corpus. The evaluation results shows that the precision of this method for Japanese sentences is 96.7%, with a recall of 98.2% and the precision for English sentences is 94.7%, with a recall of 92.7%.
Search computing has been widely used in the field of natural language processing. In recent years the QA System has been successfully applied to a number of applications but the limited amounts and the incertitude of answer were much in evidence. The wealth of information on the web makes it an attractive and simple resource for seeking quick information. Recently being quite successful in providing keyword based access to web pages commercial search portals still lack the ability to answer questions expressed in a natural language of Chinese. In this paper we propose a new method based on the Google WEB API for the QA System in restricted domains. The experiment showed that this method can get the more accurate result.Search computing has been widely used in the field of natural language processing. In recent years, the QA System has been successfully applied to a number of applications, but the limited amounts and the incertitude of answer were much in evidence. The wealth of information on the web makes it an attractive and simple resource for seeking quick information. Recently being quite successful in providing keyword based access to web pages, commercial search portals still lack the ability to answer questions expressed in a natural language of Chinese. In this paper we propose a new method based on the Google WEB API for the QA System in restricted domains. The experiment showed that this method can get the more accurate result.
これまでの web における観光案内コンテンツは一方向的に情報発信を行っている。そこで我々は、CAIWA システムを用い、ユーザの質問に回答できる対話型案内コンテンツを作成した。コンテンツとして徳島の観光名所を取り上げ、特徴や歴史等を案内することができる。ユーザは観光案内動画を閲覧中に疑問や質問などをテキストまたは音声で入力することで動画像と音声によってその回答を得ることが出来る。これにより、ユーザとの柔軟な質問応答のやりとりを実現した。Current sightseeing guide contents on web provide information in one direction. We made an interactive guide contents to answer user's question by using a CAIWA system. The contents show tourist spots in Tokushima by introducing their feature or historical background with video pictures. Users can ask questions by inputting text or speech anytime during a video and can obtain answers by video pictures and audio commentary. As a result, we achieved flexible question and answer communication between the users and the system.
感情に関わる研究において、言語データに発話者の感情を表すタグ(感情タグ)を付与した感情コーパスの構築が望まれている。しかし、人手で作成するには多くのコストを要する。そこで本稿では、感情コーパス作成の自動化を目指し、文中の語に基づいたナイーブベイズによる感情分類手法を提案する。Web から収集した学習データを用いた評価実験により提案手法の有効性を確認する。In this paper, we aim to develop Emotion corpus automatically using Naive Bayes Classifier. Emotion corpus is language data with emotion tags. Language data is the corpus which made by the sentences that we collected from web. Emotion tag stands for emotion of the people who wrote the sentences at the time. At first, we put emotion tags on the language data we collected. Next, we classify the language data using the Naive Bayes Classifier based on this data set, and I confirm the effectiveness of the method.
従来、文が表現する感情を推定する手法では、推定できる感情の種類がわずかであったり、一つ一つの単語ごとに感情別の重みを付与した辞書を構築し、推定に利用するものが多かった。そこで我々は、推定できる感情の種類を容易に増やすことができ、また感情別の重みの付与を単語 N-gram で行うことで文が指す内容に対する話し手の判断や心的態度を表すモダリティなどの感情表現も利用する、従来手法よりも高い精度で推定可能な感情推定手法の提案を目指している。このような推定手法を実現するため、我々は2つの文が表現している感情がどれほど類似しているかを計算する感情類似度計算手法を過去に提案した。感情類似度は、あらかじめ用意した、文を感情別に分類した複数のコーパス(感情コーパス)を用い、機械翻訳システムの翻訳精度を求める尺度である BLEU を基に、入力文と感情別に分類された文との類似度を計算することで求める。本稿で我々は、従来の BLEU を用いる感情推定よりも高い精度で推定を行うために、感情コーパス別に単語 N-gram の出現頻度を求めた辞書を従来手法に導入した新たな手法を提案する。提案手法の性能を調べるため、入力文から感情類似度を求め、最も感情類似度が高くなった感情と、人手で判断した入力文の感情の一致率を求める実験を行った。その結果、従来の BLEU による類似度計算を用いた手法に比べ、提案手法では 20.59%一致率が向上した。Existing methods to estimate emotions of a sentence can estimate a few kinds of emotion, and many methods use a dictionary that emotion weights is related every words. Our aim is to propose a method which can estimate with high precision using feeling expression (modality et al.) and can add new estimatable emotion easily. we gave emotion weights to N-gram to use feeling expression. To propose such method, we proposed an emotion similarity calculation. This method calculates similarity between represented emotions from two sentences. In the emotion similarity calculation, we use similarity between an input sentence and classified sentence by emotions. Classified sentences are in emotion corpora. The calculation formula is based on BLEU. BLEU is machine translation evaluation method. In this paper, we propose a new emotion similarity calculation method. This method can estimate with high precision compared to the past one. This method uses N-gram frequency dictionaries made from each emotion corpora. To examine the precision of the method, we evaluated ratio that emotions of highest emotion similarities corresponds emotions of the input. Emotions of the input was decided by human. As the results, the ratio was improved 20.59.
Search computing has been widely used in the field of natural language processing. In recent years, the QA System has been successfully applied to a number of applications, but the limited amounts and the incertitude of answer weremuch in evidence. The wealth of information on the web makes it an attractive and simple resource for seeking quick information. Recently being quite successful in providing keyword based access to web pages, commercial search portals still lack the ability to answer questions expressed in a natural language of Chinese. In this paper we propose a new method based on the Google WEB API for the QA System in restricted domains. The experiment showed that this method can get the more accurate result.
本論文では,分散音声認識(DSR: Distributed Speech Recognition)における入力系の周波数特性の差異による認識性能劣化を抑制する周波数特性正規化手法として,複数参照ケプストラムを用いた実時間周波数特性正規化手法を提案する.提案手法は,複数の参照ケプストラムを使用し,周波数特性の正規化を行うバイアスをフレーム同期で計算し,実時間で入力系の周波数特性を正規化する手法である.一般に,DSR で用いられるクライアントではメモリ量,計算量の制限があるため,提案手法ではこれらの増加量を低減させるため,参照ケプストラムをDSR フロントエンドの特徴パラメータ圧縮部で使用されるVQ コードブックの組合せで表現した.ETSI Advanced DSR フロントエンドを用いた日本音響学会新聞記事読み上げ音声コーパスの音声認識実験より,提案手法は,ETSI Advanced DSR フロントエンドにおけるBlind Equalization と比較し,周波数特性の差異による音声認識精度劣化の抑制に有効であることを確認した.特に,提案手法はMIRS フィルタ条件下でETSI Advanced DSR フロントエンド(Blind Equalization)の単語誤り率を10.8%削減することが可能であった.In this paper, we propose a real-time blind equalization method with multiple references for ETSI standard Distributed Speech Recognition (DSR) front-end. The proposed method compensates for acoustic mismatch caused by input devices. In ETSI advanced DSR frontend, the blind equalization method is introduced to compensate for acoustic mismatch. This method estimates the bias, which compensates for the mismatch, using one reference vector. If the input speech is short or contains many similar phonemes, there is concern that this method might not estimate the accurate bias. On the other hand, the proposed method estimates the bias, which is calculated on frame by frame, using multiple references instead of one reference. Using multiple references, the proposed method estimates the bias more accurately. In addition, we represent the references by combining the VQ centroids used in the data compression process of ETSI standard DSR front-end. This limits increases in memory size and computation costs on the front-end. Experimental results on a Japanese newspaper dictation task indicate that the proposed method gave better performance under acoustic mismatched conditions than the conventional blind equalization method. Especially, we observed a 10.8% improvement in the error rate under the MIRS filter condition.