Comparison of the Performance of Artificial Intelligence Versus Medical Professionals in the Polish Final Medical Examination

被引:1
|
作者
Jaworski, Aleksander [1 ]
Jasinski, Dawid [2 ]
Jaworski, Wojciech [3 ]
Hop, Aleksandra [4 ]
Janek, Artur [1 ]
Slawinska, Barbara [5 ]
Konieczniak, Lena [6 ]
Rzepka, Maciej [7 ]
Jung, Maximilian [8 ]
Syslo, Oliwia [9 ]
Jarzabek, Victoria [6 ]
Blecha, Zuzanna [5 ]
Harazinski, Konrad [5 ]
Jasinska, Natalia [10 ]
机构
[1] Specialist Med Ctr Joint Stock Co, Dept Med, Polanica Zdroj, Poland
[2] Med Univ Silesia, Dept Med, Prof K Gibinski Univ Clin Ctr, Katowice, Poland
[3] Med Univ Silesia, Dept Childrens Dev Defects Surg & Traumatol, Katowice, Poland
[4] Fryderyk Chopin Univ, Clin Hosp Rzeszow, Dept Med, Rzeszow, Poland
[5] Med Univ Silesia, Dept Med, Katowice, Poland
[6] Reg Specialised Hosp 4 Bytom, Dept Med, Bytom, Poland
[7] St Barbara Specialised Reg Hosp 5, Dept Med, Sosnowiec, Poland
[8] Univ Clin Hosp Opole, Dept Med, Opole, Poland
[9] Acad Silesia, Dept Med, Katowice, Poland
[10] Mil Univ Technol, Dept Cybernet, Warsaw, Poland
关键词
medical professionals; medical students; final medical examination; artificial intelligence; machine learning; chatgpt;
D O I
10.7759/cureus.66011
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Background: The rapid development of artificial intelligence (AI) technologies like OpenAI's Generative Pretrained Transformer (GPT), particularly ChatGPT, has shown promising applications in various fields, including medicine. This study evaluates ChatGPT's performance on the Polish Final Medical Examination (LEK), comparing its efficacy to that of human test-takers. Methods: The study analyzed ChatGPT's ability to answer 196 multiple-choice questions from the spring 2021 LEK. Questions were categorized into "clinical cases" and "other" general medical knowledge, and then divided according to medical fields. Two versions of ChatGPT (3.5 and 4.0) were tested. Statistical analyses, including Pearson's chi 2 test, and Mann-Whitney U test, were conducted to compare the AI's performance and confidence levels. Results: ChatGPT 3.5 correctly answered 50.51% of the questions, while ChatGPT 4.0 answered 77.55% correctly, surpassing the 56% passing threshold. Version 3.5 showed significantly higher confidence in correct answers, whereas version 4.0 maintained consistent confidence regardless of answer accuracy. No significant differences in performance were observed across different medical fields. Conclusions: ChatGPT 4.0 demonstrated the ability to pass the LEK, indicating substantial potential for AI in medical education and assessment. Future improvements in AI models, such as the anticipated ChatGPT 5.0, may enhance further performance, potentially equaling or surpassing human test-takers.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Is artificial intelligence for medical professionals serving the patients?
    Wilhelm, Christoph
    Steckelberg, Anke
    Rebitschek, Felix G.
    SYSTEMATIC REVIEWS, 2024, 13 (01)
  • [2] A short guide for medical professionals in the era of artificial intelligence
    Mesko, Bertalan
    Gorog, Marton
    NPJ DIGITAL MEDICINE, 2020, 3 (01)
  • [3] A short guide for medical professionals in the era of artificial intelligence
    Bertalan Meskó
    Marton Görög
    npj Digital Medicine, 3
  • [4] Evaluation of the performance of GPT-3.5 and GPT-4 on the Polish Medical Final Examination
    Rosol, Maciej
    Gasior, Jakub S.
    Laba, Jonasz
    Korzeniewski, Kacper
    Mlynczak, Marcel
    SCIENTIFIC REPORTS, 2023, 13 (01)
  • [5] Evaluation of the performance of GPT-3.5 and GPT-4 on the Polish Medical Final Examination
    Maciej Rosoł
    Jakub S. Gąsior
    Jonasz Łaba
    Kacper Korzeniewski
    Marcel Młyńczak
    Scientific Reports, 13
  • [6] AI for Doctors-A Course to Educate Medical Professionals in Artificial Intelligence for Medical Imaging
    Hedderich, Dennis M.
    Keicher, Matthias
    Wiestler, Benedikt
    Gruber, Martin J.
    Burwinkel, Hendrik
    Hinterwimmer, Florian
    Czempiel, Tobias
    Spiro, Judith E.
    Pinto dos Santos, Daniel
    Heim, Dominik
    Zimmer, Claus
    Rueckert, Daniel
    Kirschke, Jan S.
    Navab, Nassir
    HEALTHCARE, 2021, 9 (10)
  • [8] The performance evaluation of artificial intelligence ERNIE bot in Chinese National Medical Licensing Examination
    Huang, Leiyun
    Hu, Jinghan
    Cai, Qingjin
    Fu, Guangjie
    Bai, Zhenglin
    Liu, Yongzhen
    Zheng, Ji
    Meng, Zengdong
    POSTGRADUATE MEDICAL JOURNAL, 2024,
  • [9] Updated Primer on Generative Artificial Intelligence and Large Language Models in Medical Imaging for Medical Professionals
    Kim, Kiduk
    Cho, Kyungjin
    Jang, Ryoungwoo
    Kyung, Sunggu
    Lee, Soyoung
    Ham, Sungwon
    Choi, Edward
    Hong, Gil-Sun
    Kim, Namkug
    KOREAN JOURNAL OF RADIOLOGY, 2024, 25 (03) : 224 - 242
  • [10] Can an Artificial Intelligence Model Pass an Examination for Medical Specialists?
    Fuentes-Martin, Alvaro
    Cilleruelo-Ramos, Angel
    Segura-Mendez, Barbara
    Mayol, Julio
    ARCHIVOS DE BRONCONEUMOLOGIA, 2023, 59 (08): : 534 - 536