Artificial Intelligence in Medical Education: Comparative Analysis of ChatGPT, Bing, and Medical Students in Germany

被引:19
|
作者
Roos, Jonas [1 ]
Kasapovic, Adnan [1 ]
Jansen, Tom [1 ]
Kaczmarczyk, Robert [2 ,3 ]
机构
[1] Univ Hosp Bonn, Dept Orthoped & Trauma Surg, Bonn, Germany
[2] Tech Univ Munich, Dept Dermatol & Allergy, Munich, Germany
[3] Tech Univ Munich, Dept Dermatol & Allergy, Biedersteiner Str 29, D-80802 Munich, Germany
来源
JMIR MEDICAL EDUCATION | 2023年 / 9卷
关键词
medical education; state examinations; exams; large language models; artificial intelligence; ChatGPT;
D O I
10.2196/46482
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
Background: Large language models (LLMs) have demonstrated significant potential in diverse domains, including medicine. Nonetheless, there is a scarcity of studies examining their performance in medical examinations, especially those conducted in languages other than English, and in direct comparison with medical students. Analyzing the performance of LLMs in state medical examinations can provide insights into their capabilities and limitations and evaluate their potential role in medical education and examination preparation.Objective: This study aimed to assess and compare the performance of 3 LLMs, GPT-4, Bing, and GPT-3.5-Turbo, in the German Medical State Examinations of 2022 and to evaluate their performance relative to that of medical students.Methods: The LLMs were assessed on a total of 630 questions from the spring and fall German Medical State Examinations of 2022. The performance was evaluated with and without media-related questions. Statistical analyses included 1-way ANOVA and independent samples t tests for pairwise comparisons. The relative strength of the LLMs in comparison with that of the students was also evaluated.Results: GPT-4 achieved the highest overall performance, correctly answering 88.1% of questions, closely followed by Bing (86.0%) and GPT-3.5-Turbo (65.7%). The students had an average correct answer rate of 74.6%. Both GPT-4 and Bing significantly outperformed the students in both examinations. When media questions were excluded, Bing achieved the highest performance of 90.7%, closely followed by GPT-4 (90.4%), while GPT-3.5-Turbo lagged (68.2%). There was a significant decline in the performance of GPT-4 and Bing in the fall 2022 examination, which was attributed to a higher proportion of media-related questions and a potential increase in question difficulty.Conclusions: LLMs, particularly GPT-4 and Bing, demonstrate potential as valuable tools in medical education and for pretesting examination questions. Their high performance, even relative to that of medical students, indicates promising avenues for further development and integration into the educational and clinical landscape.
引用
收藏
页数:7
相关论文
共 50 条
  • [41] Grounded in reality: artificial intelligence in medical education
    Krive, Jacob
    Isola, Miriam
    Chang, Linda
    Patel, Tushar
    Anderson, Max
    Sreedhar, Radhika
    [J]. JAMIA OPEN, 2023, 6 (02)
  • [42] Dispelling the magic of artificial intelligence in medical education
    Mcquade, Casey N.
    Wijesekera, Thilan P.
    Chartash, David
    [J]. MEDICAL EDUCATION, 2024,
  • [43] Impact of Artificial Intelligence on Medical Education in Ophthalmology
    Valikodath, Nita G.
    Cole, Emily
    Ting, Daniel S. W.
    Campbell, J. Peter
    Pasquale, Louis R.
    Chiang, Michael F.
    Chan, R. V. Paul
    [J]. TRANSLATIONAL VISION SCIENCE & TECHNOLOGY, 2021, 10 (07):
  • [44] Artificial intelligence in medical education: Are we ready for it?
    Imran, Nazish
    Jawaid, Masood
    [J]. PAKISTAN JOURNAL OF MEDICAL SCIENCES, 2020, 36 (05) : 857 - 859
  • [45] ARTIFICIAL INTELLIGENCE IN MEDICINE: A COMPARATIVE STUDY OF CHATGPT'S LEARNING CAPABILITY IN RESOLVING MEDICAL SPECIALIZATION QUESTIONS
    Fuentes-Martin, A.
    Cilleruelo Ramos, A.
    Segura Mendez, B.
    Victoriano Soriano, G., I
    Mora Puentes, D.
    Represa Pastor, T.
    Perez Aragon, M.
    Soro Garcia, J.
    [J]. BRITISH JOURNAL OF SURGERY, 2024, 111
  • [46] ChatGPT in Radiology: The Advantages and Limitations of Artificial Intelligence for Medical Imaging Diagnosis
    Srivastav, Samriddhi
    Chandrakar, Rashi
    Gupta, Shalvi
    Babhulkar, Vaishnavi
    Agrawal, Sristy
    Jaiswal, Arpita
    Prasad, Roshan
    Wanjari, Mayur B.
    [J]. CUREUS JOURNAL OF MEDICAL SCIENCE, 2023, 15 (07)
  • [47] Artificial intelligence tools in medical education beyond Chat Generative Pre-trained Transformer (ChatGPT)
    Tan, Li Feng
    Ng, Isaac K. S.
    Teo, Desmond
    [J]. POSTGRADUATE MEDICAL JOURNAL, 2024, 100 (1187) : 697 - 698
  • [49] Foreign students in Germany: a comparative report of satisfaction with medical behavioral sciences education
    Hanna, M
    [J]. SWISS MEDICAL WEEKLY, 2005, 135 (39-40) : 594 - 598
  • [50] Artificial intelligence in education: the challenges of ChatGPT
    Rodrigues, Olira Saraiva
    Rodrigues, Karoline Santos
    [J]. TEXTO LIVRE-LINGUAGEM E TECNOLOGIA, 2023, 16