Performance of ChatGPT in the In-Training Examination for Anesthesiology and Pain Medicine Residents in South Korea: Observational Study

被引:0
|
作者
Yoon, Soo-Hyuk [1 ]
Oh, Seok Kyeong [2 ]
Lim, Byung Gun [2 ]
Lee, Ho-Jin [1 ]
机构
[1] Seoul Natl Univ, Coll Med, Seoul Natl Univ Hosp, Dept Anesthesiol & Pain Med, Daehak Ro 101, Seoul 03080, South Korea
[2] Korea Univ, Guro Hosp, Coll Med, Dept Anesthesiol & Pain Med, Seoul, South Korea
来源
JMIR MEDICAL EDUCATION | 2024年 / 10卷
关键词
AI tools; problem solving; anesthesiology; artificial intelligence; pain medicine; ChatGPT; health care; medical education; South Korea; BOARD;
D O I
10.2196/56859
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
Background: ChatGPT has been tested in health care, including the US Medical Licensing Examination and specialty exams,showing near-passing results. Its performance in the field of anesthesiology has been assessed using English board examination questions; however, its effectiveness in Korea remains unexplored. Objective: This study investigated the problem-solving performance of ChatGPT in the fields of anesthesiology and painmedicine in the Korean language context, highlighted advancements in artificial intelligence (AI), and explored its potentialapplications in medical education. Methods: We investigated the performance (number of correct answers/number of questions) of GPT-4, GPT-3.5, and CLOVAX in the fields of anesthesiology and pain medicine, using in-training examinations that have been administered to Koreananesthesiology residents over the past 5 years, with an annual composition of 100 questions. Questions containing images,diagrams, or photographs were excluded from the analysis. Furthermore, to assess the performance differences of the GPT acrossdifferent languages, we conducted a comparative analysis of the GPT-4's problem-solving proficiency using both the originalKorean texts and their English translations. Results: A total of 398 questions were analyzed. GPT-4 (67.8%) demonstrated a significantly better overall performance thanGPT-3.5 (37.2%) and CLOVA-X (36.7%). However, GPT-3.5 and CLOVA X did not show significant differences in their overallperformance. Additionally, the GPT-4 showed superior performance on questions translated into English, indicating a languageprocessing discrepancy (English: 75.4% vs Korean: 67.8%; difference 7.5%; 95% CI 3.1%-11.9%; P=.001). Conclusions: This study underscores the potential of AI tools, such as ChatGPT, in medical education and practice but emphasizesthe need for cautious application and further refinement, especially in non-English medical contexts. The findings suggest thatalthough AI advancements are promising, they require careful evaluation and development to ensure acceptable performanceacross diverse linguistic and professional settings.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Independent study and performance on the anesthesiology in-training examination
    Philip, John
    Whitten, Charles W.
    Johnston, William E.
    [J]. JOURNAL OF CLINICAL ANESTHESIA, 2006, 18 (06) : 471 - 473
  • [2] Performance Comparison of ChatGPT-4 and Japanese Medical Residents in the General Medicine In-Training Examination: Comparison Study
    Watari, Takashi
    Takagi, Soshi
    Sakaguchi, Kota
    Nishizaki, Yuji
    Shimizu, Taro
    Yamamoto, Yu
    Tokuda, Yasuharu
    [J]. JMIR MEDICAL EDUCATION, 2023, 9
  • [3] Can family medicine residents predict their performance on the in-training examination?
    Parker, RW
    Alford, C
    Passmore, C
    [J]. FAMILY MEDICINE, 2004, 36 (10) : 705 - 709
  • [4] Evaluating ChatGPT Performance on the Orthopaedic In-Training Examination
    Kung, Justin E.
    Marshall, Christopher
    Gauthier, Chase
    Gonzalez, Tyler A.
    Jackson III, J. Benjamin
    [J]. JBJS OPEN ACCESS, 2023, 8 (03)
  • [5] Learning Styles of Internal Medicine Residents and Association With the In-Training Examination Performance
    Olanipekun, Titilope
    Effoe, Valery
    Bakinde, Nicolas
    Bradley, Cinnamon
    Ivonye, Chinedu
    Harris, Rachel
    [J]. JOURNAL OF THE NATIONAL MEDICAL ASSOCIATION, 2020, 112 (01) : 44 - 51
  • [6] INTELLIGENCE MINDSET AND PERFORMANCE ON THE INTERNAL MEDICINE RESIDENT IN-TRAINING EXAMINATION: A RETROSPECTIVE OBSERVATIONAL STUDY
    Fuest, Stephen
    Chen, Justin
    Fink, Angela
    Frey, Regina F.
    Kao, Patricia
    [J]. JOURNAL OF GENERAL INTERNAL MEDICINE, 2020, 35 (SUPPL 1) : S740 - S741
  • [8] The hospital educational environment and performance of residents in the General Medicine In-Training Examination: a multicenter study in Japan
    Shimizu, Taro
    Tsugawa, Yusuke
    Tanoue, Yusuke
    Konishi, Ryota
    Nishizaki, Yuji
    Kishimoto, Mitsumasa
    Shiojiri, Toshiaki
    Tokuda, Yasuharu
    [J]. INTERNATIONAL JOURNAL OF GENERAL MEDICINE, 2013, 6 : 637 - 640
  • [9] The Impact of the Hospital Volume on the Performance of Residents on the General Medicine In-Training Examination: A Multicenter Study in Japan
    Mizuno, Atsushi
    Tsugawa, Yusuke
    Shimizu, Taro
    Nishizaki, Yuji
    Okubo, Tomoya
    Tanoue, Yusuke
    Konishi, Ryota
    Shiojiri, Toshiaki
    Tokuda, Yasuharu
    [J]. INTERNAL MEDICINE, 2016, 55 (12) : 1553 - 1558
  • [10] Multimodal In-training Examination in an Emergency Medicine Residency Training Program: A Longitudinal Observational Study
    Liu, Pin
    Chen, Shou-Yen
    Chang, Yu-Che
    Ng, Chip-Jin
    Chaou, Chung-Hsien
    [J]. FRONTIERS IN MEDICINE, 2022, 9