Evaluating the performance of large language models: ChatGPT and Google Bard in generating differential diagnoses in clinicopathological conferences of neurodegenerative disorders

被引：35

作者：

Koga, Shunsuke ^{[1
,2
]}

Martin, Nicholas B. ^{[1
]}

Dickson, Dennis W. ^{[1
]}

机构：

[1] Mayo Clin, Dept Neurosci, Jacksonville, FL USA

[2] Hosp Univ Penn, Dept Pathol & Lab Med, 3400 Spruce St, Philadelphia, PA 19104 USA

来源：

BRAIN PATHOLOGY | 2024年 / 34卷 / 03期

关键词：

artificial intelligence; ChatGPT; clinicopathological conference; CPC; Google Bard; large language model; neuropathology; pathology;

D O I：

10.1111/bpa.13207

中图分类号：

R74 [神经病学与精神病学];

学科分类号：

摘要：

This study explores the utility of the large language models (LLMs), specifically ChatGPT and Google Bard, in predicting neuropathologic diagnoses from clinical summaries. A total of 25 cases of neurodegenerative disorders presented at Mayo Clinic brain bank Clinico-Pathological Conferences were analyzed. The LLMs provided multiple pathologic diagnoses and their rationales, which were compared with the final clinical diagnoses made by physicians. ChatGPT-3.5, ChatGPT-4, and Google Bard correctly made primary diagnoses in 32%, 52%, and 40% of cases, respectively, while correct diagnoses were included in 76%, 84%, and 76% of cases, respectively. These findings highlight the potential of artificial intelligence tools like ChatGPT in neuropathology, suggesting they may facilitate more comprehensive discussions in clinicopathological conferences.

引用

页数：4

共 20 条

[1] Performance of Large Language Models (ChatGPT, Bing Search, and Google Bard) in Solving Case Vignettes in Physiology
Dhanvijay, Anup Kumar D.
Pinjar, Mohammed Jaffer
Dhokane, Nitin
Sorte, Smita R.
Kumari, Amita
Mondal, Himel
CUREUS JOURNAL OF MEDICAL SCIENCE, 2023, 15 (08)
[2] Benchmarking large language models' performances for myopia care: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, and Google Bard
Lim, Zhi Wei
Pushpanathan, Krithi
Yew, Samantha Min Er
Lai, Yien
Sun, Chen-Hsin
Lam, Janice Sing Harn
Chen, David Ziyou
Goh, Jocelyn Hui Lin
Tan, Marcus Chun Jin
Sheng, Bin
Cheng, Ching-Yu
Koh, Victor Teck Chang
Tham, Yih-Chung
EBIOMEDICINE, 2023, 95
[3] The performance of arti fi cial intelligence models in generating responses to general orthodontic questions: ChatGPT vs Google Bard
Daraqel, Baraa
Wafaie, Khaled
Mohammed, Hisham
Cao, Li
Mheissen, Samer
Liu, Yang
Zheng, Leilei
AMERICAN JOURNAL OF ORTHODONTICS AND DENTOFACIAL ORTHOPEDICS, 2024, 165 (06) : 652 - 662
[4] Large Language Models in Hematology Case Solving: A Comparative Study of ChatGPT-3.5, Google Bard, and Microsoft Bing
Kumari, Amita
Kumari, Anita
Singh, Amita
Singh, Sanjeet K.
Juhi, Ayesha
Dhanvijay, Anup Kumar D.
Pinjar, Mohammed Jaffer
Mondal, Himel
CUREUS JOURNAL OF MEDICAL SCIENCE, 2023, 15 (08)
[5] Large language models (LLMs) in the evaluation of emergency radiology reports: performance of ChatGPT-4, Perplexity, and Bard
Infante, A.
Gaudino, S.
Orsini, F.
Del Ciello, A.
Gulli, C.
Merlino, B.
Natale, L.
Iezzi, R.
Sala, E.
CLINICAL RADIOLOGY, 2024, 79 (02) : 102 - 106
[6] Evidence-based potential of generative artificial intelligence large language models in orthodontics: a comparative study of ChatGPT, Google Bard, and Microsoft Bing
Makrygiannakis, Miltiadis A.
Giannakopoulos, Kostis
Kaklamanos, Eleftherios G.
EUROPEAN JOURNAL OF ORTHODONTICS, 2024,
[7] Re: Large language models (LLMs) in evaluation of emergency radiology reports: performance of ChatGPT-4, Perplexity and Bard
Wiwanitkit, S.
Wiwanitkit, V.
CLINICAL RADIOLOGY, 2024, 79 (04)
[8] Evaluation of the Performance of Generative AI Large Language Models ChatGPT, Google Bard, and Microsoft Bing Chat in Supporting Evidence-Based Dentistry: Comparative Mixed Methods Study
Giannakopoulos, Kostis
Kavadella, Argyro
Salim, Anas Aaqel
Stamatopoulos, Vassilis
Kaklamanos, Eleftherios G.
JOURNAL OF MEDICAL INTERNET RESEARCH, 2023, 25
[9] Evaluating the efficacy of leading large language models in the Japanese national dental hygienist examination: A comparative analysis of ChatGPT, Bard, and Bing Chat
Yamaguchi, Shino
Morishita, Masaki
Fukuda, Hikaru
Muraoka, Kosuke
Nakamura, Taiji
Yoshioka, Izumi
Soh, Inho
Ono, Kentaro
Awano, Shuji
JOURNAL OF DENTAL SCIENCES, 2024, 19 (04) : 2262 - 2267
[10] Re: Re: Large language models (LLMs) in evaluation of emergency radiology reports: performance of ChatGPT-4, Perplexity, and Bard
Amato, Infante
CLINICAL RADIOLOGY, 2024, 79 (07)

← 1 2 →