Comparison of the Performance of Artificial Intelligence Versus Medical Professionals in the Polish Final Medical Examination

被引:1
|
作者
Jaworski, Aleksander [1 ]
Jasinski, Dawid [2 ]
Jaworski, Wojciech [3 ]
Hop, Aleksandra [4 ]
Janek, Artur [1 ]
Slawinska, Barbara [5 ]
Konieczniak, Lena [6 ]
Rzepka, Maciej [7 ]
Jung, Maximilian [8 ]
Syslo, Oliwia [9 ]
Jarzabek, Victoria [6 ]
Blecha, Zuzanna [5 ]
Harazinski, Konrad [5 ]
Jasinska, Natalia [10 ]
机构
[1] Specialist Med Ctr Joint Stock Co, Dept Med, Polanica Zdroj, Poland
[2] Med Univ Silesia, Dept Med, Prof K Gibinski Univ Clin Ctr, Katowice, Poland
[3] Med Univ Silesia, Dept Childrens Dev Defects Surg & Traumatol, Katowice, Poland
[4] Fryderyk Chopin Univ, Clin Hosp Rzeszow, Dept Med, Rzeszow, Poland
[5] Med Univ Silesia, Dept Med, Katowice, Poland
[6] Reg Specialised Hosp 4 Bytom, Dept Med, Bytom, Poland
[7] St Barbara Specialised Reg Hosp 5, Dept Med, Sosnowiec, Poland
[8] Univ Clin Hosp Opole, Dept Med, Opole, Poland
[9] Acad Silesia, Dept Med, Katowice, Poland
[10] Mil Univ Technol, Dept Cybernet, Warsaw, Poland
关键词
medical professionals; medical students; final medical examination; artificial intelligence; machine learning; chatgpt;
D O I
10.7759/cureus.66011
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Background: The rapid development of artificial intelligence (AI) technologies like OpenAI's Generative Pretrained Transformer (GPT), particularly ChatGPT, has shown promising applications in various fields, including medicine. This study evaluates ChatGPT's performance on the Polish Final Medical Examination (LEK), comparing its efficacy to that of human test-takers. Methods: The study analyzed ChatGPT's ability to answer 196 multiple-choice questions from the spring 2021 LEK. Questions were categorized into "clinical cases" and "other" general medical knowledge, and then divided according to medical fields. Two versions of ChatGPT (3.5 and 4.0) were tested. Statistical analyses, including Pearson's chi 2 test, and Mann-Whitney U test, were conducted to compare the AI's performance and confidence levels. Results: ChatGPT 3.5 correctly answered 50.51% of the questions, while ChatGPT 4.0 answered 77.55% correctly, surpassing the 56% passing threshold. Version 3.5 showed significantly higher confidence in correct answers, whereas version 4.0 maintained consistent confidence regardless of answer accuracy. No significant differences in performance were observed across different medical fields. Conclusions: ChatGPT 4.0 demonstrated the ability to pass the LEK, indicating substantial potential for AI in medical education and assessment. Future improvements in AI models, such as the anticipated ChatGPT 5.0, may enhance further performance, potentially equaling or surpassing human test-takers.
引用
收藏
页数:8
相关论文
共 50 条
  • [21] Artificial Intelligence in Medical Education
    Prashar, Jai
    ACADEMIC MEDICINE, 2021, 96 (09) : 1229 - 1229
  • [22] Medical writing and artificial intelligence
    Soyer, P.
    DIAGNOSTIC AND INTERVENTIONAL IMAGING, 2019, 100 (01) : 1 - 2
  • [23] Artificial Intelligence and Medical Humanities
    Kirsten Ostherr
    Journal of Medical Humanities, 2022, 43 : 211 - 232
  • [24] Artificial Intelligence in Medical Education
    Maini, Baljeet
    Maini, Ekta
    INDIAN PEDIATRICS, 2021, 58 (05) : 496 - 497
  • [25] THE FINAL EXAMINATION AS AN INSTRUMENT OF MEDICAL EDUCATION
    PATEY, DH
    LANCET, 1957, 2 (AUG31): : 395 - 396
  • [26] Resistance to Medical Artificial Intelligence
    Longoni, Chiara
    Bonezzi, Andrea
    Morewedge, Carey K.
    JOURNAL OF CONSUMER RESEARCH, 2019, 46 (04) : 629 - 650
  • [27] Artificial Intelligence in Medical Imaging
    Wagner, Jessyca B.
    RADIOLOGIC TECHNOLOGY, 2019, 90 (05) : 489 - 501
  • [28] Artificial intelligence and medical imaging
    Sun, Roger
    Deutsch, Eric
    Fournier, Laure
    BULLETIN DU CANCER, 2022, 109 (01) : 83 - 88
  • [29] Artificial intelligence in medical physics
    Bollmann, Steffen
    Kuestner, Thomas
    Tao, Qian
    Zoellner, Frank G.
    ZEITSCHRIFT FUR MEDIZINISCHE PHYSIK, 2024, 34 (02): : 177 - 178
  • [30] Artificial Intelligence in Medical Applications
    Chan, Yung-Kuan
    Chen, Yung-Fu
    Pham, Tuan
    Chang, Weide
    Hsieh, Ming-Yuan
    JOURNAL OF HEALTHCARE ENGINEERING, 2018, 2018