Performance Assessment of ChatGPT versus Bard in Detecting Alzheimer's Dementia

被引：0

作者：

Balamurali, B. T. ^{[1
]}

Chen, Jer-Ming ^{[1
]}

机构：

[1] Singapore Univ Technol & Design, Sci Math & Technol SMT, 8 Somapah Rd, Singapore 487372, Singapore

来源：

DIAGNOSTICS | 2024年 / 14卷 / 08期

关键词：

Large Language Models; chatbots; GPT-3.5; GPT-4; ChatGPT; Bard; Alzheimer's dementia; zero-shot learning; chain-of-thought; ecological diagnostic screening; spontaneous speech; MENTAL-STATE-EXAMINATION; DISEASE; INTERVENTION; PREVENTION; IMPAIRMENT; MOCA;

D O I：

10.3390/diagnostics14080817

中图分类号：

R5 [内科学];

学科分类号：

1002 ; 100201 ;

摘要：

Large language models (LLMs) find increasing applications in many fields. Here, three LLM chatbots (ChatGPT-3.5, ChatGPT-4, and Bard) are assessed in their current form, as publicly available, for their ability to recognize Alzheimer's dementia (AD) and Cognitively Normal (CN) individuals using textual input derived from spontaneous speech recordings. A zero-shot learning approach is used at two levels of independent queries, with the second query (chain-of-thought prompting) eliciting more detailed information than the first. Each LLM chatbot's performance is evaluated on the prediction generated in terms of accuracy, sensitivity, specificity, precision, and F1 score. LLM chatbots generated a three-class outcome ("AD", "CN", or "Unsure"). When positively identifying AD, Bard produced the highest true-positives (89% recall) and highest F1 score (71%), but tended to misidentify CN as AD, with high confidence (low "Unsure" rates); for positively identifying CN, GPT-4 resulted in the highest true-negatives at 56% and highest F1 score (62%), adopting a diplomatic stance (moderate "Unsure" rates). Overall, the three LLM chatbots can identify AD vs. CN, surpassing chance-levels, but do not currently satisfy the requirements for clinical application.

引用

页数：13

共 50 条

[1] ChatGPT versus Bard: A comparative study
Ahmed, Imtiaz
Kajol, Mashrafi
Hasan, Uzma
Datta, Partha Protim
Roy, Ayon
Reza, Md. Rokonuzzaman
[J]. ENGINEERING REPORTS, 2024, 6 (11)
[2] Detecting Alzheimer's Dementia Degree
Wu, Edmond Q.
Peng, Xian-Yong
Chen, Sheng-Di
Zhao, Xiao-Yan
Tang, Zhi-Ri
[J]. IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2022, 14 (01) : 116 - 125
[3] ChatGPT and Bard Performance on the POSCOMP Exam
Saldanha, Mateus Santos
Digiampietri, Luciano Antonio
[J]. PROCEEDINGS OF THE 20TH BRAZILIAN SYMPOSIUM ON INFORMATIONS SYSTEMS, SBSI 2024, 2024,
[4] Performance of ChatGPT and Bard in self-assessment questions for nephrology board renewal
Noda, Ryunosuke
Izaki, Yuto
Kitano, Fumiya
Komatsu, Jun
Ichikawa, Daisuke
Shibagaki, Yugo
[J]. CLINICAL AND EXPERIMENTAL NEPHROLOGY, 2024, 28 (05) : 465 - 469
[5] Comparative Performance of ChatGPT and Bard in a Text-Based Radiology Knowledge Assessment
Patil, Nikhil
Huang, Ryan
van der Pol, Christian
Larocque, Natasha
[J]. CANADIAN ASSOCIATION OF RADIOLOGISTS JOURNAL-JOURNAL DE L ASSOCIATION CANADIENNE DES RADIOLOGISTES, 2024, 75 (02): : 344 - 350
[6] Cognitive Performance and Incident Alzheimer’s Dementia in Men Versus Women
Ioannis Liampas
V. Siokas
C. G. Lyketsos
E. Dardiotis
[J]. The Journal of Prevention of Alzheimer's Disease, 2024, 11 : 162 - 170
[7] Cognitive Performance and Incident Alzheimer's Dementia in Men Versus Women
Liampas, Ioannis
Siokas, V.
Lyketsos, C. G.
Dardiotis, E.
[J]. JPAD-JOURNAL OF PREVENTION OF ALZHEIMERS DISEASE, 2023, 11 (01): : 162 - 170
[8] Predictive and Diagnostic Utility of Brief Neuropsychological Assessment in Detecting Alzheimer's Pathology and Progression to Dementia
Eliassen, Ingvild Vollo
Fladby, Tormod
Kirsebom, Bjorn-Eivind
Waterloo, Knut
Eckerstrom, Marie
Wallin, Anders
Brathen, Geir
Aarsland, Dag
Hessen, Erik
[J]. NEUROPSYCHOLOGY, 2020, 34 (08) : 851 - 861
[9] Performance of artificial intelligence chatbots in sleep medicine certification board exams: ChatGPT versus Google Bard
Cheong, Ryan Chin Taw
Pang, Kenny Peter
Unadkat, Samit
Mcneillis, Venkata
Williamson, Andrew
Joseph, Jonathan
Randhawa, Premjit
Andrews, Peter
Paleri, Vinidh
[J]. EUROPEAN ARCHIVES OF OTO-RHINO-LARYNGOLOGY, 2024, 281 (04) : 2137 - 2143
[10] Performance of artificial intelligence chatbots in sleep medicine certification board exams: ChatGPT versus Google Bard
Ryan Chin Taw Cheong
Kenny Peter Pang
Samit Unadkat
Venkata Mcneillis
Andrew Williamson
Jonathan Joseph
Premjit Randhawa
Peter Andrews
Vinidh Paleri
[J]. European Archives of Oto-Rhino-Laryngology, 2024, 281 : 2137 - 2143

← 1 2 3 4 5 →