Evaluating the performance of large language models: ChatGPT and Google Bard in generating differential diagnoses in clinicopathological conferences of neurodegenerative disorders

被引:35
|
作者
Koga, Shunsuke [1 ,2 ]
Martin, Nicholas B. [1 ]
Dickson, Dennis W. [1 ]
机构
[1] Mayo Clin, Dept Neurosci, Jacksonville, FL USA
[2] Hosp Univ Penn, Dept Pathol & Lab Med, 3400 Spruce St, Philadelphia, PA 19104 USA
关键词
artificial intelligence; ChatGPT; clinicopathological conference; CPC; Google Bard; large language model; neuropathology; pathology;
D O I
10.1111/bpa.13207
中图分类号
R74 [神经病学与精神病学];
学科分类号
摘要
This study explores the utility of the large language models (LLMs), specifically ChatGPT and Google Bard, in predicting neuropathologic diagnoses from clinical summaries. A total of 25 cases of neurodegenerative disorders presented at Mayo Clinic brain bank Clinico-Pathological Conferences were analyzed. The LLMs provided multiple pathologic diagnoses and their rationales, which were compared with the final clinical diagnoses made by physicians. ChatGPT-3.5, ChatGPT-4, and Google Bard correctly made primary diagnoses in 32%, 52%, and 40% of cases, respectively, while correct diagnoses were included in 76%, 84%, and 76% of cases, respectively. These findings highlight the potential of artificial intelligence tools like ChatGPT in neuropathology, suggesting they may facilitate more comprehensive discussions in clinicopathological conferences.
引用
收藏
页数:4
相关论文
共 20 条
  • [1] Performance of Large Language Models (ChatGPT, Bing Search, and Google Bard) in Solving Case Vignettes in Physiology
    Dhanvijay, Anup Kumar D.
    Pinjar, Mohammed Jaffer
    Dhokane, Nitin
    Sorte, Smita R.
    Kumari, Amita
    Mondal, Himel
    CUREUS JOURNAL OF MEDICAL SCIENCE, 2023, 15 (08)
  • [2] Benchmarking large language models' performances for myopia care: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, and Google Bard
    Lim, Zhi Wei
    Pushpanathan, Krithi
    Yew, Samantha Min Er
    Lai, Yien
    Sun, Chen-Hsin
    Lam, Janice Sing Harn
    Chen, David Ziyou
    Goh, Jocelyn Hui Lin
    Tan, Marcus Chun Jin
    Sheng, Bin
    Cheng, Ching-Yu
    Koh, Victor Teck Chang
    Tham, Yih-Chung
    EBIOMEDICINE, 2023, 95
  • [3] The performance of arti fi cial intelligence models in generating responses to general orthodontic questions: ChatGPT vs Google Bard
    Daraqel, Baraa
    Wafaie, Khaled
    Mohammed, Hisham
    Cao, Li
    Mheissen, Samer
    Liu, Yang
    Zheng, Leilei
    AMERICAN JOURNAL OF ORTHODONTICS AND DENTOFACIAL ORTHOPEDICS, 2024, 165 (06) : 652 - 662
  • [4] Large Language Models in Hematology Case Solving: A Comparative Study of ChatGPT-3.5, Google Bard, and Microsoft Bing
    Kumari, Amita
    Kumari, Anita
    Singh, Amita
    Singh, Sanjeet K.
    Juhi, Ayesha
    Dhanvijay, Anup Kumar D.
    Pinjar, Mohammed Jaffer
    Mondal, Himel
    CUREUS JOURNAL OF MEDICAL SCIENCE, 2023, 15 (08)
  • [5] Large language models (LLMs) in the evaluation of emergency radiology reports: performance of ChatGPT-4, Perplexity, and Bard
    Infante, A.
    Gaudino, S.
    Orsini, F.
    Del Ciello, A.
    Gulli, C.
    Merlino, B.
    Natale, L.
    Iezzi, R.
    Sala, E.
    CLINICAL RADIOLOGY, 2024, 79 (02) : 102 - 106
  • [6] Evidence-based potential of generative artificial intelligence large language models in orthodontics: a comparative study of ChatGPT, Google Bard, and Microsoft Bing
    Makrygiannakis, Miltiadis A.
    Giannakopoulos, Kostis
    Kaklamanos, Eleftherios G.
    EUROPEAN JOURNAL OF ORTHODONTICS, 2024,
  • [7] Re: Large language models (LLMs) in evaluation of emergency radiology reports: performance of ChatGPT-4, Perplexity and Bard
    Wiwanitkit, S.
    Wiwanitkit, V.
    CLINICAL RADIOLOGY, 2024, 79 (04)
  • [8] Evaluation of the Performance of Generative AI Large Language Models ChatGPT, Google Bard, and Microsoft Bing Chat in Supporting Evidence-Based Dentistry: Comparative Mixed Methods Study
    Giannakopoulos, Kostis
    Kavadella, Argyro
    Salim, Anas Aaqel
    Stamatopoulos, Vassilis
    Kaklamanos, Eleftherios G.
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2023, 25
  • [9] Evaluating the efficacy of leading large language models in the Japanese national dental hygienist examination: A comparative analysis of ChatGPT, Bard, and Bing Chat
    Yamaguchi, Shino
    Morishita, Masaki
    Fukuda, Hikaru
    Muraoka, Kosuke
    Nakamura, Taiji
    Yoshioka, Izumi
    Soh, Inho
    Ono, Kentaro
    Awano, Shuji
    JOURNAL OF DENTAL SCIENCES, 2024, 19 (04) : 2262 - 2267
  • [10] Re: Re: Large language models (LLMs) in evaluation of emergency radiology reports: performance of ChatGPT-4, Perplexity, and Bard
    Amato, Infante
    CLINICAL RADIOLOGY, 2024, 79 (07)