Artificial Intelligence in Medical Education: Comparative Analysis of ChatGPT, Bing, and Medical Students in Germany

被引:19
|
作者
Roos, Jonas [1 ]
Kasapovic, Adnan [1 ]
Jansen, Tom [1 ]
Kaczmarczyk, Robert [2 ,3 ]
机构
[1] Univ Hosp Bonn, Dept Orthoped & Trauma Surg, Bonn, Germany
[2] Tech Univ Munich, Dept Dermatol & Allergy, Munich, Germany
[3] Tech Univ Munich, Dept Dermatol & Allergy, Biedersteiner Str 29, D-80802 Munich, Germany
来源
JMIR MEDICAL EDUCATION | 2023年 / 9卷
关键词
medical education; state examinations; exams; large language models; artificial intelligence; ChatGPT;
D O I
10.2196/46482
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
Background: Large language models (LLMs) have demonstrated significant potential in diverse domains, including medicine. Nonetheless, there is a scarcity of studies examining their performance in medical examinations, especially those conducted in languages other than English, and in direct comparison with medical students. Analyzing the performance of LLMs in state medical examinations can provide insights into their capabilities and limitations and evaluate their potential role in medical education and examination preparation.Objective: This study aimed to assess and compare the performance of 3 LLMs, GPT-4, Bing, and GPT-3.5-Turbo, in the German Medical State Examinations of 2022 and to evaluate their performance relative to that of medical students.Methods: The LLMs were assessed on a total of 630 questions from the spring and fall German Medical State Examinations of 2022. The performance was evaluated with and without media-related questions. Statistical analyses included 1-way ANOVA and independent samples t tests for pairwise comparisons. The relative strength of the LLMs in comparison with that of the students was also evaluated.Results: GPT-4 achieved the highest overall performance, correctly answering 88.1% of questions, closely followed by Bing (86.0%) and GPT-3.5-Turbo (65.7%). The students had an average correct answer rate of 74.6%. Both GPT-4 and Bing significantly outperformed the students in both examinations. When media questions were excluded, Bing achieved the highest performance of 90.7%, closely followed by GPT-4 (90.4%), while GPT-3.5-Turbo lagged (68.2%). There was a significant decline in the performance of GPT-4 and Bing in the fall 2022 examination, which was attributed to a higher proportion of media-related questions and a potential increase in question difficulty.Conclusions: LLMs, particularly GPT-4 and Bing, demonstrate potential as valuable tools in medical education and for pretesting examination questions. Their high performance, even relative to that of medical students, indicates promising avenues for further development and integration into the educational and clinical landscape.
引用
收藏
页数:7
相关论文
共 50 条
  • [1] A Comparative Analysis of ChatGPT and Medical Faculty Graduates in Medical Specialization Exams: Uncovering the Potential of Artificial Intelligence in Medical Education
    Gencer, Gulcan
    Gencer, Kerem
    [J]. CUREUS JOURNAL OF MEDICAL SCIENCE, 2024, 16 (08)
  • [2] Artificial intelligence in medical education - perception among medical students
    Jackson, Preetha
    Ponath Sukumaran, Gayathri
    Babu, Chikku
    Tony, M. Christa
    Jack, Deen Stephano
    Reshma, V. R.
    Davis, Dency
    Kurian, Nisha
    John, Anjum
    [J]. BMC MEDICAL EDUCATION, 2024, 24 (01)
  • [3] The next paradigm shift? ChatGPT, artificial intelligence, and medical education
    Wang, Leonard Kuan-Pei
    Paidisetty, Praneet Sai
    Cano, Alicia Magdalena
    [J]. MEDICAL TEACHER, 2023, 45 (08) : 925 - 925
  • [4] Potential and limitations of ChatGPT and generative artificial intelligence in medical safety education
    Wang, Xin
    Liu, Xin-Qiao
    [J]. WORLD JOURNAL OF CLINICAL CASES, 2023, 11 (32)
  • [5] ChatGPT-Based Learning: Generative Artificial Intelligence in Medical Education
    Stretton, Brandon
    Kovoor, Joshua
    Arnold, Matthew
    Bacchi, Stephen
    [J]. MEDICAL SCIENCE EDUCATOR, 2024, 34 (01) : 215 - 217
  • [6] Impact of Democratizing Artificial Intelligence: Using ChatGPT in Medical Education and Training
    Chen, Anjun
    Chen, Wenjun
    Liu, Yanfang
    [J]. ACADEMIC MEDICINE, 2024, 99 (06) : 589 - 589
  • [7] ChatGPT-Based Learning: Generative Artificial Intelligence in Medical Education
    Brandon Stretton
    Joshua Kovoor
    Matthew Arnold
    Stephen Bacchi
    [J]. Medical Science Educator, 2024, 34 : 215 - 217
  • [8] ChatGPT and Generative Artificial Intelligence for Medical Education: Potential Impact and Opportunity
    Boscardin, Christy K.
    Gin, Brian
    Golde, Polo Black
    Hauer, Karen E.
    [J]. ACADEMIC MEDICINE, 2024, 99 (01) : 22 - 27
  • [9] The Desire of Medical Students to Integrate Artificial Intelligence Into Medical Education: An Opinion Article
    Frommeyer, Timothy C.
    Fursmidt, Reid M.
    Gilbert, Michael M.
    Bett, Ean S.
    [J]. FRONTIERS IN DIGITAL HEALTH, 2022, 4
  • [10] Artificial intelligence in medical education
    Masters, Ken
    [J]. MEDICAL TEACHER, 2019, 41 (09) : 976 - 980