Artificial Intelligence in Medical Education: Comparative Analysis of ChatGPT, Bing, and Medical Students in Germany

被引:19
|
作者
Roos, Jonas [1 ]
Kasapovic, Adnan [1 ]
Jansen, Tom [1 ]
Kaczmarczyk, Robert [2 ,3 ]
机构
[1] Univ Hosp Bonn, Dept Orthoped & Trauma Surg, Bonn, Germany
[2] Tech Univ Munich, Dept Dermatol & Allergy, Munich, Germany
[3] Tech Univ Munich, Dept Dermatol & Allergy, Biedersteiner Str 29, D-80802 Munich, Germany
来源
JMIR MEDICAL EDUCATION | 2023年 / 9卷
关键词
medical education; state examinations; exams; large language models; artificial intelligence; ChatGPT;
D O I
10.2196/46482
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
Background: Large language models (LLMs) have demonstrated significant potential in diverse domains, including medicine. Nonetheless, there is a scarcity of studies examining their performance in medical examinations, especially those conducted in languages other than English, and in direct comparison with medical students. Analyzing the performance of LLMs in state medical examinations can provide insights into their capabilities and limitations and evaluate their potential role in medical education and examination preparation.Objective: This study aimed to assess and compare the performance of 3 LLMs, GPT-4, Bing, and GPT-3.5-Turbo, in the German Medical State Examinations of 2022 and to evaluate their performance relative to that of medical students.Methods: The LLMs were assessed on a total of 630 questions from the spring and fall German Medical State Examinations of 2022. The performance was evaluated with and without media-related questions. Statistical analyses included 1-way ANOVA and independent samples t tests for pairwise comparisons. The relative strength of the LLMs in comparison with that of the students was also evaluated.Results: GPT-4 achieved the highest overall performance, correctly answering 88.1% of questions, closely followed by Bing (86.0%) and GPT-3.5-Turbo (65.7%). The students had an average correct answer rate of 74.6%. Both GPT-4 and Bing significantly outperformed the students in both examinations. When media questions were excluded, Bing achieved the highest performance of 90.7%, closely followed by GPT-4 (90.4%), while GPT-3.5-Turbo lagged (68.2%). There was a significant decline in the performance of GPT-4 and Bing in the fall 2022 examination, which was attributed to a higher proportion of media-related questions and a potential increase in question difficulty.Conclusions: LLMs, particularly GPT-4 and Bing, demonstrate potential as valuable tools in medical education and for pretesting examination questions. Their high performance, even relative to that of medical students, indicates promising avenues for further development and integration into the educational and clinical landscape.
引用
收藏
页数:7
相关论文
共 50 条
  • [31] ChatGPT and Artificial Intelligence in Medical Writing: Concerns and Ethical Considerations
    Doyal, Alexander S.
    Sender, David
    Nanda, Monika
    Serrano, Ricardo A.
    [J]. CUREUS JOURNAL OF MEDICAL SCIENCE, 2023, 15 (08)
  • [32] ChatGPT: roles and boundaries of the new artificial intelligence tool in medical education and health research - correspondence
    Periaysamy, Aravind Gandhi
    Satapathy, Prakasini
    Neyazi, Ahmad
    Padhi, Bijaya K.
    [J]. ANNALS OF MEDICINE AND SURGERY, 2023, 85 (04): : 1317 - 1318
  • [33] Artificial Intelligence Readiness Among Jordanian Medical Students: Using Medical Artificial Intelligence Readiness Scale For Medical Students (MAIRS-MS)
    Hamad, Mohammad
    Qtaishat, Fares
    Mhairat, Enjood
    AL-Qunbar, Ahmad
    Jaradat, Maha
    Mousa, Abdullah
    Faidi, Baha'eddin
    Alkhaldi, Sireen
    [J]. JOURNAL OF MEDICAL EDUCATION AND CURRICULAR DEVELOPMENT, 2024, 11
  • [34] The ChatGPT (Generative Artificial Intelligence) Revolution Has Made Artificial Intelligence Approachable for Medical Professionals
    Mesko, Bertalan
    [J]. JOURNAL OF MEDICAL INTERNET RESEARCH, 2023, 25
  • [35] Embracing ChatGPT for Medical Education: Exploring Its Impact on Doctors and Medical Students
    Wu, Yijun
    Zheng, Yue
    Feng, Baijie
    Yang, Yuqi
    Kang, Kai
    Zhao, Ailin
    [J]. JMIR MEDICAL EDUCATION, 2024, 10
  • [36] Assessment of Artificial Intelligence Platforms With Regard to Medical Microbiology Knowledge: An Analysis of ChatGPT and Gemini
    Ranjan, Jai
    Ahmad, Absar
    Subudhi, Monalisa
    Kumar, Ajay
    [J]. CUREUS JOURNAL OF MEDICAL SCIENCE, 2024, 16 (05)
  • [37] Artificial Intelligence Education and Tools for Medical and Health Informatics Students: Systematic Review
    Sapci, A. Hasan
    Sapci, H. Aylin
    [J]. JMIR MEDICAL EDUCATION, 2020, 6 (01):
  • [38] Implications of artificial intelligence for medical education Comment
    Rampton, Vanessa
    Mittelman, Michael
    Goldhahn, Joerg
    [J]. LANCET DIGITAL HEALTH, 2020, 2 (03): : E111 - E112
  • [39] Future Implications of Artificial Intelligence in Medical Education
    Bohler, Forrest
    Aggarwal, Nikhil
    Peters, Garrett
    Taranikanti, Varna
    [J]. CUREUS JOURNAL OF MEDICAL SCIENCE, 2024, 16 (01)
  • [40] Artificial Intelligence Revolutionizing the Field of Medical Education
    Narayanan, Suresh
    Ramakrishnan, Rajprasath
    Durairaj, Elantamilan
    Das, Arghya
    [J]. CUREUS JOURNAL OF MEDICAL SCIENCE, 2023, 15 (11)