How does ChatGPT perform on the European Board of Pediatric Surgery examination? A randomized comparative study

被引:2
|
作者
Azizoglu, Mustafa [1 ]
Aydogdu, Bahattin [2 ]
机构
[1] Dicle Univ, Med Sch, Dept Pediat Surg, Diyarbakir, Turkiye
[2] Balikesir Univ, Dept Pediat Surg, Balikesir, Turkiye
来源
MEDICINA BALEAR | 2024年 / 39卷 / 01期
关键词
ChatGPT; Pediatric Surgery; exam; questions; artificial intelligence;
D O I
10.3306/AJHS.2024.39.01.23
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Purpose: The purpose of this study was to conduct a detailed comparison of the accuracy and responsiveness of GPT-3.5 and GPT-4 in the realm of pediatric surgery. Specifically, we sought to assess their ability to correctly answer a series of sample questions of European Board of Pediatric Surgery (EBPS) exam. Methods: This study was conducted between 20 May 2023 and 30 May 2023. This study undertook a comparative analysis of two AI language models, GPT-3.5 and GPT-4, in the field of pediatric surgery, particularly in the context of EBPS exam sample questions. Two sets of 105 (total 210) sample questions each, derived from the EBPS sample questions, were collated. Results: In General Pediatric Surgery, GPT-3.5 provided correct answers for 7 questions (46.7%), and GPT-4 had a higher accuracy with 13 correct responses (86.7%) (p=0.020). For Newborn Surgery and Pediatric Urology, GPT-3.5 correctly answered 6 questions (40.0%), and GPT-4, however, correctly answered 12 questions (80.0%) (p= 0.025). In total, GPT-3.5 correctly answered 46 questions out of 105 (43.8%), and GPT-4 showed significantly better performance, correctly answering 80 questions (76.2%) (p<0.001). Given the total responses, when GPT-4 was compared with GPT-3.5, the Odds Ratio was found to be 4.1. This suggests that GPT-4 was 4.1 times more likely to provide a correct answer to the pediatric surgery questions compared to GPT-3.5. Conclusion: This comparative study concludes that GPT-4 significantly outperforms GPT-3.5 in responding to EBPS exam questions.
引用
收藏
页码:23 / 26
页数:4
相关论文
共 50 条
  • [31] The European Vascular Surgery Specialist Examination: 10 good reasons to be certified as a Fellow of the European Board of Vascular Surgery (FEBVS)
    Eckstein, H. H.
    ANGIOLOGIA, 2016, 68 (03): : 176 - 179
  • [33] FEBVS (Fellow of the European Board of Vascular Surgery): European examination in vascular surgery [FEBVS (Fellow of the European Board of Vascular Surgery): Europäische Facharztprüfung für Gefäßchirurgie]
    Mansilha A.
    Scott D.A.J.
    McLain D.
    Gefässchirurgie, 2014, 19 (2) : 153 - 157
  • [34] Letter to the editor for the article "Performance of ChatGPT-3.5 and ChatGPT-4 on the European Board of Urology (EBU) exams: a comparative analysis"
    Song, Yuxuan
    Xu, Tao
    WORLD JOURNAL OF UROLOGY, 2024, 42 (01)
  • [35] The European Medical Specialist Examination in Vascular Surgery 10 good Reasons for the Qualification as a Fellow of the European Board of Vascular Surgery (FEBVS)
    Eckstein, H. -H.
    GEFASSCHIRURGIE, 2015, 20 (03): : 179 - 181
  • [36] How does combinatorial testing perform in the real world: an empirical study
    Linghuan Hu
    W. Eric Wong
    D. Richard Kuhn
    Raghu N. Kacker
    Empirical Software Engineering, 2020, 25 : 2661 - 2693
  • [37] How Does ChatGPT Perform on the Italian Residency Admission National Exam Compared to 15,869 Medical Graduates?
    Bonetti, Mario Alessandri
    Giorgino, Riccardo
    Afflitto, Gabriele Gallo
    De Lorenzi, Francesca
    Egro, Francesco M.
    ANNALS OF BIOMEDICAL ENGINEERING, 2024, 52 (04) : 745 - 749
  • [38] How does combinatorial testing perform in the real world: an empirical study
    Hu, Linghuan
    Wong, W. Eric
    Kuhn, D. Richard
    Kacker, Raghu N.
    EMPIRICAL SOFTWARE ENGINEERING, 2020, 25 (04) : 2661 - 2693
  • [39] How Does ChatGPT Perform on the Italian Residency Admission National Exam Compared to 15,869 Medical Graduates?
    Mario Alessandri Bonetti
    Riccardo Giorgino
    Gabriele Gallo Afflitto
    Francesca De Lorenzi
    Francesco M. Egro
    Annals of Biomedical Engineering, 2024, 52 : 745 - 749
  • [40] Failure on a Vascular Surgery Board-American Board of Surgery Examination does not predict cardiovascular outcomes in the Society for Vascular Surgery Vascular Quality Initiative
    Kraiss, Larry W.
    Al-Dulaimi, Ragheed
    Cronenwett, Jack L.
    Goodney, Philip P.
    Clair, Daniel G.
    Hallett, John Jeb
    Rhodes, Robert
    Mills, Joseph L.
    Presson, Angela P.
    Brooke, Benjamin S.
    JOURNAL OF VASCULAR SURGERY, 2020, 72 (05) : 1753 - 1760