Performance Assessment of ChatGPT versus Bard in Detecting Alzheimer's Dementia

被引:0
|
作者
Balamurali, B. T. [1 ]
Chen, Jer-Ming [1 ]
机构
[1] Singapore Univ Technol & Design, Sci Math & Technol SMT, 8 Somapah Rd, Singapore 487372, Singapore
关键词
Large Language Models; chatbots; GPT-3.5; GPT-4; ChatGPT; Bard; Alzheimer's dementia; zero-shot learning; chain-of-thought; ecological diagnostic screening; spontaneous speech; MENTAL-STATE-EXAMINATION; DISEASE; INTERVENTION; PREVENTION; IMPAIRMENT; MOCA;
D O I
10.3390/diagnostics14080817
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Large language models (LLMs) find increasing applications in many fields. Here, three LLM chatbots (ChatGPT-3.5, ChatGPT-4, and Bard) are assessed in their current form, as publicly available, for their ability to recognize Alzheimer's dementia (AD) and Cognitively Normal (CN) individuals using textual input derived from spontaneous speech recordings. A zero-shot learning approach is used at two levels of independent queries, with the second query (chain-of-thought prompting) eliciting more detailed information than the first. Each LLM chatbot's performance is evaluated on the prediction generated in terms of accuracy, sensitivity, specificity, precision, and F1 score. LLM chatbots generated a three-class outcome ("AD", "CN", or "Unsure"). When positively identifying AD, Bard produced the highest true-positives (89% recall) and highest F1 score (71%), but tended to misidentify CN as AD, with high confidence (low "Unsure" rates); for positively identifying CN, GPT-4 resulted in the highest true-negatives at 56% and highest F1 score (62%), adopting a diplomatic stance (moderate "Unsure" rates). Overall, the three LLM chatbots can identify AD vs. CN, surpassing chance-levels, but do not currently satisfy the requirements for clinical application.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] ChatGPT versus Bard: A comparative study
    Ahmed, Imtiaz
    Kajol, Mashrafi
    Hasan, Uzma
    Datta, Partha Protim
    Roy, Ayon
    Reza, Md. Rokonuzzaman
    [J]. ENGINEERING REPORTS, 2024, 6 (11)
  • [2] Detecting Alzheimer's Dementia Degree
    Wu, Edmond Q.
    Peng, Xian-Yong
    Chen, Sheng-Di
    Zhao, Xiao-Yan
    Tang, Zhi-Ri
    [J]. IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2022, 14 (01) : 116 - 125
  • [3] ChatGPT and Bard Performance on the POSCOMP Exam
    Saldanha, Mateus Santos
    Digiampietri, Luciano Antonio
    [J]. PROCEEDINGS OF THE 20TH BRAZILIAN SYMPOSIUM ON INFORMATIONS SYSTEMS, SBSI 2024, 2024,
  • [4] Performance of ChatGPT and Bard in self-assessment questions for nephrology board renewal
    Noda, Ryunosuke
    Izaki, Yuto
    Kitano, Fumiya
    Komatsu, Jun
    Ichikawa, Daisuke
    Shibagaki, Yugo
    [J]. CLINICAL AND EXPERIMENTAL NEPHROLOGY, 2024, 28 (05) : 465 - 469
  • [5] Comparative Performance of ChatGPT and Bard in a Text-Based Radiology Knowledge Assessment
    Patil, Nikhil
    Huang, Ryan
    van der Pol, Christian
    Larocque, Natasha
    [J]. CANADIAN ASSOCIATION OF RADIOLOGISTS JOURNAL-JOURNAL DE L ASSOCIATION CANADIENNE DES RADIOLOGISTES, 2024, 75 (02): : 344 - 350
  • [6] Cognitive Performance and Incident Alzheimer’s Dementia in Men Versus Women
    Ioannis Liampas
    V. Siokas
    C. G. Lyketsos
    E. Dardiotis
    [J]. The Journal of Prevention of Alzheimer's Disease, 2024, 11 : 162 - 170
  • [7] Cognitive Performance and Incident Alzheimer's Dementia in Men Versus Women
    Liampas, Ioannis
    Siokas, V.
    Lyketsos, C. G.
    Dardiotis, E.
    [J]. JPAD-JOURNAL OF PREVENTION OF ALZHEIMERS DISEASE, 2023, 11 (01): : 162 - 170
  • [8] Predictive and Diagnostic Utility of Brief Neuropsychological Assessment in Detecting Alzheimer's Pathology and Progression to Dementia
    Eliassen, Ingvild Vollo
    Fladby, Tormod
    Kirsebom, Bjorn-Eivind
    Waterloo, Knut
    Eckerstrom, Marie
    Wallin, Anders
    Brathen, Geir
    Aarsland, Dag
    Hessen, Erik
    [J]. NEUROPSYCHOLOGY, 2020, 34 (08) : 851 - 861
  • [9] Performance of artificial intelligence chatbots in sleep medicine certification board exams: ChatGPT versus Google Bard
    Cheong, Ryan Chin Taw
    Pang, Kenny Peter
    Unadkat, Samit
    Mcneillis, Venkata
    Williamson, Andrew
    Joseph, Jonathan
    Randhawa, Premjit
    Andrews, Peter
    Paleri, Vinidh
    [J]. EUROPEAN ARCHIVES OF OTO-RHINO-LARYNGOLOGY, 2024, 281 (04) : 2137 - 2143
  • [10] Performance of artificial intelligence chatbots in sleep medicine certification board exams: ChatGPT versus Google Bard
    Ryan Chin Taw Cheong
    Kenny Peter Pang
    Samit Unadkat
    Venkata Mcneillis
    Andrew Williamson
    Jonathan Joseph
    Premjit Randhawa
    Peter Andrews
    Vinidh Paleri
    [J]. European Archives of Oto-Rhino-Laryngology, 2024, 281 : 2137 - 2143