Large language models (LLMs) in radiology exams for medical students: Performance and consequences

被引:0
|
作者
Gotta, Jennifer [1 ]
Hong, Quang Anh Le [1 ]
Koch, Vitali [1 ]
Gruenewald, Leon D. [1 ]
Geyer, Tobias [2 ]
Martin, Simon S. [1 ]
Scholtz, Jan-Erik [1 ]
Booz, Christian [1 ]
Dos Santos, Daniel Pinto [1 ]
Mahmoudi, Scherwin [1 ]
Eichler, Katrin [1 ]
Gruber-Rouh, Tatjana [1 ]
Hammerstingl, Renate [1 ]
Biciusca, Teodora [1 ]
Juergens, Lisa Joy [1 ]
Hoehne, Elena [1 ]
Mader, Christoph [1 ]
Vogl, Thomas J. [1 ]
Reschke, Philipp [1 ]
机构
[1] Goethe Univ Frankfurt, Dept Diagnost & Intervent Radiol, Frankfurt, Germany
[2] Rostock Univ, Med Ctr, Inst Diagnost & Intervent Radiol, Pediat Radiol & Neuroradiol, Rostock, Germany
关键词
AI; medical; education;
D O I
10.1055/a-2437-2067
中图分类号
R8 [特种医学]; R445 [影像诊断学];
学科分类号
1002 ; 100207 ; 1009 ;
摘要
Purpose The evolving field of medical education is being shaped by technological advancements, including the integration of Large Language Models (LLMs) like ChatGPT. These models could be invaluable resources for medical students, by simplifying complex concepts and enhancing interactive learning by providing personalized support. LLMs have shown impressive performance in professional examinations, even without specific domain training, making them particularly relevant in the medical field. This study aims to assess the performance of LLMs in radiology examinations for medical students, thereby shedding light on their current capabilities and implications. Materials and Methods This study was conducted using 151 multiple-choice questions, which were used for radiology exams for medical students. The questions were categorized by type and topic and were then processed using OpenAI's GPT-3.5 and GPT- 4 via their API, or manually put into Perplexity AI with GPT-3.5 and Bing. LLM performance was evaluated overall, by question type and by topic. Results GPT-3.5 achieved a 67.6% overall accuracy on all 151 questions, while GPT-4 outperformed it significantly with an 88.1% overall accuracy (p<0.001). GPT-4 demonstrated superior performance in both lower-order and higher-order questions compared to GPT-3.5, Perplexity AI, and medical students, with GPT-4 particularly excelling in higher-order questions. All GPT models would have successfully passed the radiology exam for medical students at our university. Conclusion In conclusion, our study highlights the potential of LLMs as accessible knowledge resources for medical students. GPT-4 performed well on lower-order as well as higher-order questions, making ChatGPT-4 a potentially very useful tool for reviewing radiology exam questions. Radiologists should be aware of ChatGPT's limitations, including its tendency to confidently provide incorrect responses.
引用
收藏
页数:11
相关论文
共 50 条
  • [11] LARGE LANGUAGE MODELS (LLMS) AND CHATGPT FOR BIOMEDICINE
    Arighi, Cecilia
    Brenner, Steven
    Lu, Zhiyong
    BIOCOMPUTING 2024, PSB 2024, 2024, : 641 - 644
  • [12] Large language models (LLMs) and the institutionalization of misinformation
    Garry, Maryanne
    Chan, Way Ming
    Foster, Jeffrey
    Henkel, Linda A.
    TRENDS IN COGNITIVE SCIENCES, 2024, 28 (12) : 1078 - 1088
  • [13] linguagem grande (LLMs) Linguistic ambiguity analysis in large language models (LLMs)
    Moraes, Lavinia de Carvalho
    Silverio, Irene Cristina
    Marques, Rafael Alexandre Sousa
    Anaia, Bianca de Castro
    de Paula, Dandara Freitas
    Faria, Maria Carolina Schincariol de
    Cleveston, Iury
    Correia, Alana de Santana
    Freitag, Raquel Meister Ko
    TEXTO LIVRE-LINGUAGEM E TECNOLOGIA, 2025, 18
  • [14] Recommender Systems in the Era of Large Language Models (LLMs)
    Zhao, Zihuai
    Fan, Wenqi
    Li, Jiatong
    Liu, Yunqing
    Mei, Xiaowei
    Wang, Yiqi
    Wen, Zhen
    Wang, Fei
    Zhao, Xiangyu
    Tang, Jiliang
    Li, Qing
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (11) : 6889 - 6907
  • [15] Large language models (LLMs) as agents for augmented democracy
    Gudino, Jairo F.
    Grandi, Umberto
    Hidalgo, Cesar
    PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2024, 382 (2285):
  • [16] Are Large Language Models (LLMs) Ready for Agricultural Applications?
    Shende, Ketan
    Resource: Engineering and Technology for Sustainable World, 2025, 32 (01): : 28 - 30
  • [17] The Use of Structured Oral Exams for the Assessment of Medical Students in their Radiology Clerkship
    Goins, Stacy M.
    French, Robert J.
    Martin, Jonathan G.
    CURRENT PROBLEMS IN DIAGNOSTIC RADIOLOGY, 2023, 52 (05) : 330 - 333
  • [18] Computing Architecture for Large-Language Models (LLMs) and Large Multimodal Models (LMMs)
    Liang, Bor-Sung
    PROCEEDINGS OF THE 2024 INTERNATIONAL SYMPOSIUM ON PHYSICAL DESIGN, ISPD 2024, 2024, : 233 - 234
  • [19] Context is everything in regulatory application of large language models (LLMs)
    Tong, Weida
    Renaudin, Michael
    DRUG DISCOVERY TODAY, 2024, 29 (04)
  • [20] Operating Conversational Large Language Models (LLMs)in the Presenceof Errors
    Gao, Zhen
    Deng, Jie
    Reviriego, Pedro
    Liu, Shanshan
    Pozo, Alejando
    Lombardi, Fabrizio
    IEEE NANOTECHNOLOGY MAGAZINE, 2025, 19 (01) : 31 - 37