ChatGPT Versus Consultants: Blinded Evaluation on Answering Otorhinolaryngology Case-Based Questions

被引:14
|
作者
Buhr, Christoph Raphael [1 ,2 ,6 ]
Smith, Harry [3 ]
Huppertz, Tilman [1 ]
Bahr-Hamm, Katharina [1 ]
Matthias, Christoph [1 ]
Blaikie, Andrew [2 ]
Kelsey, Tom [3 ]
Kuhn, Sebastian [4 ,5 ]
Eckrich, Jonas [1 ]
机构
[1] Johannes Gutenberg Univ Mainz, Univ Med Ctr, Dept Otorhinolaryngol, Mainz, Germany
[2] Univ St Andrews, Sch Med, St Andrews, Scotland
[3] Univ St Andrews, Sch Comp Sci, St Andrews, Scotland
[4] Philipps Univ Marburg, Inst Digital Med, Marburg, Germany
[5] Univ Hosp Giessen & Marburg, Marburg, Germany
[6] Johannes Gutenberg Univ Mainz, Univ Med Ctr, Dept Otorhinolaryngol, Langenbeckstr 1, D-55131 Mainz, Germany
来源
JMIR MEDICAL EDUCATION | 2023年 / 9卷
关键词
large language models; LLMs; LLM; artificial intelligence; AI; ChatGPT; otorhinolaryngology; ORL; digital health; chatbots; global health; low-and middle-income countries; telemedicine; telehealth; language model; chatbot; ONLINE HEALTH INFORMATION; NONVERBAL-COMMUNICATION; SEEKING; ANXIETY; GOOGLE;
D O I
10.2196/49183
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
Background: Large language models (LLMs), such as ChatGPT (Open AI), are increasingly used in medicine and supplement standard search engines as information sources. This leads to more "consultations" of LLMs about personal medical symptoms.Objective: This study aims to evaluate ChatGPT's performance in answering clinical case-based questions in otorhinolaryngology (ORL) in comparison to ORL consultants' answers.Methods: We used 41 case-based questions from established ORL study books and past German state examinations for doctors. The questions were answered by both ORL consultants and ChatGPT 3. ORL consultants rated all responses, except their own, on medical adequacy, conciseness, coherence, and comprehensibility using a 6-point Likert scale. They also identified (in a blinded setting) if the answer was created by an ORL consultant or ChatGPT. Additionally, the character count was compared. Due to the rapidly evolving pace of technology, a comparison between responses generated by ChatGPT 3 and ChatGPT 4 wasResults: Ratings in all categories were significantly higher for ORL consultants (P<.001). Although inferior to the scores of the ORL consultants, ChatGPT's scores were relatively higher in semantic categories (conciseness, coherence, and comprehensibility) compared to medical adequacy. ORL consultants identified ChatGPT as the source correctly in 98.4% (121/123) of cases. ChatGPT's answers had a significantly higher character count compared to ORL consultants (P<.001). Comparison between responses generated by ChatGPT 3 and ChatGPT 4 showed a slight improvement in medical accuracy as well as a better coherence of the answers provided. Contrarily, neither the conciseness (P=.06) nor the comprehensibility (P=.08) improved significantly despite the significant increase in the mean amount of characters by 52.5% (n= (1470-964)/964; P<.001).Conclusions: While ChatGPT provided longer answers to medical problems, medical adequacy and conciseness were significantly lower compared to ORL consultants' answers. LLMs have potential as augmentative tools for medical care, but their "consultation" for medical problems carries a high risk of misinformation as their high semantic quality may mask contextual deficits.
引用
收藏
页数:9
相关论文
共 50 条
  • [41] Similarity evaluation between problem and case in case-based learning for modeling
    Hu, Xiang-Pei
    Qian, Guo-Ming
    Li, Wen-Yu
    Xu, Yong-Ren
    Harbin Gongye Daxue Xuebao/Journal of Harbin Institute of Technology, 2001, 33 (05): : 587 - 591
  • [42] Preclinical curriculum of prospective case-based teaching with faculty- and student-blinded approach
    Sarah Waliany
    Wendy Caceres
    Sylvia Bereknyei Merrell
    Sonoo Thadaney
    Noelle Johnstone
    Lars Osterberg
    BMC Medical Education, 19
  • [43] Preclinical curriculum of prospective case-based teaching with faculty- and student-blinded approach
    Waliany, Sarah
    Caceres, Wendy
    Merrell, Sylvia Bereknyei
    Thadaney, Sonoo
    Johnstone, Noelle
    Osterberg, Lars
    BMC MEDICAL EDUCATION, 2019, 19 (1)
  • [44] The Accuracy of ChatGPT-Generated Responses in Answering Commonly Asked Patient Questions About Labor Epidurals: A Survey-Based Study
    Mootz, Allison A.
    Carvalho, Brendan
    Sultan, Pervez
    Nguyen, Teresa P.
    Reale, Sharon C.
    ANESTHESIA AND ANALGESIA, 2024, 138 (05): : 1142 - 1144
  • [45] Evaluation of Readiness of IT Organizations to Agile Transformation Based on Case-Based Reasoning
    Orlowski, Cezary
    Deregowski, Tomasz
    Kurzawski, Milosz
    Ziolkowski, Artur
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS (ACIIDS 2017), PT II, 2017, 10192 : 787 - 797
  • [46] Automatic Generation of Medical Case-Based Multiple-Choice Questions (MCQs): A Review of Methodologies, Applications, Evaluation, and Future Directions
    Al Shuraiqi, Somaiya
    Abdulsalam, Abdulrahman Aal
    Masters, Ken
    Zidoum, Hamza
    Alzaabi, Adhari
    BIG DATA AND COGNITIVE COMPUTING, 2024, 8 (10)
  • [47] Comparison of traditional essay questions versus case based modified essay questions in biochemistry
    Bansal, Aastha
    Dubey, Abhishek
    Singh, Vijay Kumar
    Goswami, Binita
    Kaushik, Smita
    BIOCHEMISTRY AND MOLECULAR BIOLOGY EDUCATION, 2023, 51 (05) : 494 - 498
  • [48] CBR-RAG: Case-Based Reasoning for Retrieval Augmented Generation in LLMs for Legal Question Answering
    Wiratunga, Nirmalie
    Abeyratne, Ramitha
    Jayawardena, Lasal
    Martin, Kyle
    Massie, Stewart
    Nkisi-Orji, Ikechukwu
    Weerasinghe, Ruvan
    Liret, Anne
    Fleisch, Bruno
    CASE-BASED REASONING RESEARCH AND DEVELOPMENT, ICCBR 2024, 2024, 14775 : 445 - 460
  • [49] Autism Training in Pediatric Residency: Evaluation of a Case-Based Curriculum
    Major, Nili E.
    Peacock, Georgina
    Ruben, Wendy
    Thomas, Jana
    Weitzman, Carol C.
    JOURNAL OF AUTISM AND DEVELOPMENTAL DISORDERS, 2013, 43 (05) : 1171 - 1177
  • [50] Implementation and Evaluation of Case-Based Learning Approach in Microbiology and Immunology
    Sannathimmappa, Mohan
    Nambiar, Vinod
    Arvindakshan, Rajeev
    INTERNATIONAL JOURNAL OF MEDICAL RESEARCH & HEALTH SCIENCES, 2019, 8 (01): : 1 - +