医学部附属病院

柳田 育孝

ヤナギタ ヤスタカ  (Yasutaka Yanagita)

基本情報

所属
千葉大学 医学部附属病院総合診療科 特任助教
学位
学士(工学)(2008年3月 早稲田大学)
学士(医学)(2015年3月 長崎大学)
博士(医学)(2023年3月 千葉大学)

連絡先
ahna5650chiba-u.jp
研究者番号
80971754
ORCID ID
 https://orcid.org/0000-0002-9213-8247
J-GLOBAL ID
202201005424900985
researchmap会員ID
R000040208

主要な論文

 37
  • Yasutaka Yanagita, Mutsuka Kurihara, Daiki Yokokawa, Takanori Uehara, Masatomi Ikusaka
    Annals of Internal Medicine: Clinical Cases 3(11) 2024年11月19日  査読有り筆頭著者
  • Yasutaka Yanagita, Daiki Yokokawa, Shun Uchida, Yu Li, Takanori Uehara, Masatomi Ikusaka
    Journal of general internal medicine 2024年9月23日  査読有り筆頭著者
    BACKGROUND: Creating clinical vignettes requires considerable effort. Recent developments in generative artificial intelligence (AI) for natural language processing have been remarkable and may allow for the easy and immediate creation of diverse clinical vignettes. OBJECTIVE: In this study, we evaluated the medical accuracy and grammatical correctness of AI-generated clinical vignettes in Japanese and verified their usefulness. METHODS: Clinical vignettes were created using the generative AI model GPT-4-0613. The input prompts for the clinical vignettes specified the following seven elements: (1) age, (2) sex, (3) chief complaint and time course since onset, (4) physical findings, (5) examination results, (6) diagnosis, and (7) treatment course. The list of diseases integrated into the vignettes was based on 202 cases considered in the management of diseases and symptoms in Japan's Primary Care Physicians Training Program. The clinical vignettes were evaluated for medical and Japanese-language accuracy by three physicians using a five-point scale. A total score of 13 points or above was defined as "sufficiently beneficial and immediately usable with minor revisions," a score between 10 and 12 points was defined as "partly insufficient and in need of modifications," and a score of 9 points or below was defined as "insufficient." RESULTS: Regarding medical accuracy, of the 202 clinical vignettes, 118 scored 13 points or above, 78 scored between 10 and 12 points, and 6 scored 9 points or below. Regarding Japanese-language accuracy, 142 vignettes scored 13 points or above, 56 scored between 10 and 12 points, and 4 scored 9 points or below. Overall, 97% (196/202) of vignettes were available with some modifications. CONCLUSION: Overall, 97% of the clinical vignettes proved practically useful, based on confirmation and revision by Japanese medical physicians. Given the significant effort required by physicians to create vignettes without AI, using GPT is expected to greatly optimize this process.
  • Yasutaka Yanagita, Daiki Yokokawa, Fumitoshi Fukuzawa, Shun Uchida, Takanori Uehara, Masatomi Ikusaka
    BMC medical education 24(1) 536-536 2024年5月15日  査読有り筆頭著者
    BACKGROUND: An illness script is a specific script format geared to represent patient-oriented clinical knowledge organized around enabling conditions, faults (i.e., pathophysiological process), and consequences. Generative artificial intelligence (AI) stands out as an educational aid in continuing medical education. The effortless creation of a typical illness script by generative AI could help the comprehension of key features of diseases and increase diagnostic accuracy. No systematic summary of specific examples of illness scripts has been reported since illness scripts are unique to each physician. OBJECTIVE: This study investigated whether generative AI can generate illness scripts. METHODS: We utilized ChatGPT-4, a generative AI, to create illness scripts for 184 diseases based on the diseases and conditions integral to the National Model Core Curriculum in Japan for undergraduate medical education (2022 revised edition) and primary care specialist training in Japan. Three physicians applied a three-tier grading scale: "A" denotes that the content of each disease's illness script proves sufficient for training medical students, "B" denotes that it is partially lacking but acceptable, and "C" denotes that it is deficient in multiple respects. RESULTS: By leveraging ChatGPT-4, we successfully generated each component of the illness script for 184 diseases without any omission. The illness scripts received "A," "B," and "C" ratings of 56.0% (103/184), 28.3% (52/184), and 15.8% (29/184), respectively. CONCLUSION: Useful illness scripts were seamlessly and instantaneously created using ChatGPT-4 by employing prompts appropriate for medical students. The technology-driven illness script is a valuable tool for introducing medical students to key features of diseases.
  • Fumitoshi Fukuzawa, Yasutaka Yanagita, Daiki Yokokawa, Shun Uchida, Shiho Yamashita, Yu Li, Kiyoshi Shikino, Tomoko Tsukamoto, Kazutaka Noda, Takanori Uehara, Masatomi Ikusaka
    JMIR medical education 10 e52674 2024年4月8日  査読有り
    BACKGROUND: Medical history contributes approximately 80% to a diagnosis, although physical examinations and laboratory investigations increase a physician's confidence in the medical diagnosis. The concept of artificial intelligence (AI) was first proposed more than 70 years ago. Recently, its role in various fields of medicine has grown remarkably. However, no studies have evaluated the importance of patient history in AI-assisted medical diagnosis. OBJECTIVE: This study explored the contribution of patient history to AI-assisted medical diagnoses and assessed the accuracy of ChatGPT in reaching a clinical diagnosis based on the medical history provided. METHODS: Using clinical vignettes of 30 cases identified in The BMJ, we evaluated the accuracy of diagnoses generated by ChatGPT. We compared the diagnoses made by ChatGPT based solely on medical history with the correct diagnoses. We also compared the diagnoses made by ChatGPT after incorporating additional physical examination findings and laboratory data alongside history with the correct diagnoses. RESULTS: ChatGPT accurately diagnosed 76.6% (23/30) of the cases with only the medical history, consistent with previous research targeting physicians. We also found that this rate was 93.3% (28/30) when additional information was included. CONCLUSIONS: Although adding additional information improves diagnostic accuracy, patient history remains a significant factor in AI-assisted medical diagnosis. Thus, when using AI in medical diagnosis, it is crucial to include pertinent and correct patient histories for an accurate diagnosis. Our findings emphasize the continued significance of patient history in clinical diagnoses in this age and highlight the need for its integration into AI-assisted medical diagnosis systems.
  • Daiki Yokokawa, Yasutaka Yanagita, Yu Li, Shiho Yamashita, Kiyoshi Shikino, Kazutaka Noda, Tomoko Tsukamoto, Takanori Uehara, Masatomi Ikusaka
    Diagnosis (Berlin, Germany) 2024年2月23日  
  • Yasutaka Yanagita, Daiki Yokokawa, Shun Uchida, Junsuke Tawara, Masatomi Ikusaka
    JMIR Formative Research 7 e48023 2023年10月13日  査読有り筆頭著者
    BACKGROUND: ChatGPT (OpenAI) has gained considerable attention because of its natural and intuitive responses. ChatGPT sometimes writes plausible-sounding but incorrect or nonsensical answers, as stated by OpenAI as a limitation. However, considering that ChatGPT is an interactive AI that has been trained to reduce the output of unethical sentences, the reliability of the training data is high and the usefulness of the output content is promising. Fortunately, in March 2023, a new version of ChatGPT, GPT-4, was released, which, according to internal evaluations, was expected to increase the likelihood of producing factual responses by 40% compared with its predecessor, GPT-3.5. The usefulness of this version of ChatGPT in English is widely appreciated. It is also increasingly being evaluated as a system for obtaining medical information in languages other than English. Although it does not reach a passing score on the national medical examination in Chinese, its accuracy is expected to gradually improve. Evaluation of ChatGPT with Japanese input is limited, although there have been reports on the accuracy of ChatGPT's answers to clinical questions regarding the Japanese Society of Hypertension guidelines and on the performance of the National Nursing Examination. OBJECTIVE: The objective of this study is to evaluate whether ChatGPT can provide accurate diagnoses and medical knowledge for Japanese input. METHODS: Questions from the National Medical Licensing Examination (NMLE) in Japan, administered by the Japanese Ministry of Health, Labour and Welfare in 2022, were used. All 400 questions were included. Exclusion criteria were figures and tables that ChatGPT could not recognize; only text questions were extracted. We instructed GPT-3.5 and GPT-4 to input the Japanese questions as they were and to output the correct answers for each question. The output of ChatGPT was verified by 2 general practice physicians. In case of discrepancies, they were checked by another physician to make a final decision. The overall performance was evaluated by calculating the percentage of correct answers output by GPT-3.5 and GPT-4. RESULTS: Of the 400 questions, 292 were analyzed. Questions containing charts, which are not supported by ChatGPT, were excluded. The correct response rate for GPT-4 was 81.5% (237/292), which was significantly higher than the rate for GPT-3.5, 42.8% (125/292). Moreover, GPT-4 surpassed the passing standard (>72%) for the NMLE, indicating its potential as a diagnostic and therapeutic decision aid for physicians. CONCLUSIONS: GPT-4 reached the passing standard for the NMLE in Japan, entered in Japanese, although it is limited to written questions. As the accelerated progress in the past few months has shown, the performance of the AI will improve as the large language model continues to learn more, and it may well become a decision support system for medical professionals by providing more accurate information.
  • Yasutaka Yanagita, Kiyoshi Shikino, Kosuke Ishizuka, Shun Uchida, Yu Li, Daiki Yokokawa, Tomoko Tsukamoto, Kazutaka Noda, Takanori Uehara, Masatomi Ikusaka
    BMC medical education 23(1) 383-383 2023年5月25日  査読有り筆頭著者
    BACKGROUND: A clinical diagnostic support system (CDSS) can support medical students and physicians in providing evidence-based care. In this study, we investigate diagnostic accuracy based on the history of present illness between groups of medical students using a CDSS, Google, and neither (control). Further, the degree of diagnostic accuracy of medical students using a CDSS is compared with that of residents using neither a CDSS nor Google. METHODS: This study is a randomized educational trial. The participants comprised 64 medical students and 13 residents who rotated in the Department of General Medicine at Chiba University Hospital from May to December 2020. The medical students were randomly divided into the CDSS group (n = 22), Google group (n = 22), and control group (n = 20). Participants were asked to provide the three most likely diagnoses for 20 cases, mainly a history of a present illness (10 common and 10 emergent diseases). Each correct diagnosis was awarded 1 point (maximum 20 points). The mean scores of the three medical student groups were compared using a one-way analysis of variance. Furthermore, the mean scores of the CDSS, Google, and residents' (without CDSS or Google) groups were compared. RESULTS: The mean scores of the CDSS (12.0 ± 1.3) and Google (11.9 ± 1.1) groups were significantly higher than those of the control group (9.5 ± 1.7; p = 0.02 and p = 0.03, respectively). The residents' group's mean score (14.7 ± 1.4) was higher than the mean scores of the CDSS and Google groups (p = 0.01). Regarding common disease cases, the mean scores were 7.4 ± 0.7, 7.1 ± 0.7, and 8.2 ± 0.7 for the CDSS, Google, and residents' groups, respectively. There were no significant differences in mean scores (p = 0.1). CONCLUSIONS: Medical students who used the CDSS and Google were able to list differential diagnoses more accurately than those using neither. Furthermore, they could make the same level of differential diagnoses as residents in the context of common diseases. TRIAL REGISTRATION: This study was retrospectively registered with the University Hospital Medical Information Network Clinical Trials Registry on 24/12/2020 (unique trial number: UMIN000042831).

MISC

 34

書籍等出版物

 4

主要な講演・口頭発表等

 32

所属学協会

 8

共同研究・競争的資金等の研究課題

 5

社会貢献活動

 1

メディア報道

 2
  • 日経メディカル 2023年2月14日 インターネットメディア
    これまでの連載を通じて診断エラーがどのように認識され定義されているのか、また、診断に影響を与える要因をご理解いただけと思う。特に、診断に影響を与える要因は様々であるが、その中の1つに技術やツールがある。技術やツールの中では、特に電子カルテやその他のヘルスITツールと関連して、医師のタスクを低減するものとして人工知能技術(AI技術)が導入されてきている。AI技術を応用した診断支援システムが重要なインフラとなりつつある。医療における科学技術の導入は年々増加しており、特に、AIは機械学習、深層学習へと進み、画層診断支援を皮切りに医薬品開発、ゲノム研究などへも応用され成果を上げてきている。 臨床医にとっては日常診療における診断・治療支援が最も恩恵を受ける技術であり、その中心的な役割を果たすのが臨床診断サポートシステム(Clinical Decision Support System:CDSS)である。このCDSSの技術は、患者の体験や診断の不確実性を管理しながら、最小限の資源で診断の質を高め、適時に診断し、患者に説明することを目指すDiagnostic Excellence1,2) を実現する上で確実に重要な役割を担ってくるであろう。 臨床診断では「病歴聴取が診断に寄与する割合は8割に及ぶ」ことはよく知られており3) 、この病歴聴取で想起した疾患はその後の身体診察や検査の選択、アセスメントに影響する4) 。特に、病歴聴取が終わった段階で適切な鑑別診断を想起していることは診断エラー回避に寄与することが報告されており5) 、CDSSを有効利用することで診断エラーが低減することは明らかである。今後この技術を使いこなすことは、医療を行う上で必須の能力となっていくだろう。

その他

 2