Performance of large language models in oral and maxillofacial surgery examinations

被引:2
|
作者
Quah, B. [1 ,2 ]
Yong, C. W. [1 ,2 ]
Lai, C. W. M. [1 ]
Islam, I. [1 ,2 ]
机构
[1] Natl Univ Singapore, Fac Dent, 9 Lower Kent Ridge Rd, Singapore 119085, Singapore
[2] Natl Univ Ctr Oral Hlth, Discipline Oral & Maxillofacial Surg, Singapore, Singapore
关键词
Artificial intelligence; Oral surgery; Dental education; Academic performance; Dentistry;
D O I
10.1016/j.ijom.2024.06.003
中图分类号
R78 [口腔科学];
学科分类号
1003 ;
摘要
This study aimed to determine the accuracy of large language models (LLMs) in answering oral and maxillofacial surgery (OMS) multiple choice questions. A total of 259 questions from the university's question bank were answered by the LLMs (GPT-3.5, GPT-4, Llama 2, Gemini, and Copilot). The scores per category as well as the total score out of 259 were recorded and evaluated, with the passing score set at 50%. The mean overall score amongst all LLMs was 62.5%. GPT-4 performed the best (76.8%, 95% confidence interval (CI) 71.4-82.2%), followed by Copilot (72.6%, 95% CI 67.2-78.0%), GPT-3.5 (62.2%, 95% CI 56.4-68.0%), Gemini (58.7%, 95% CI 52.9-64.5%), and Llama 2 (42.5%, 95% CI 37.1-48.6%). There was a statistically significant difference between the scores of the five LLMs overall (chi(2) = 79.9, df = 4, P < 0.001) and within all categories except 'basic sciences' (P = 0.129), 'dentoalveolar and implant surgery' (P = 0.052), and 'oral medicine/pathology/radiology' (P = 0.801). The LLMs performed best in 'basic sciences' (68.9%) and poorest in 'pharmacology' (45.9%). The LLMs can be used as adjuncts in teaching, but should not be used for clinical decision-making until the models are further developed and validated.
引用
收藏
页码:881 / 886
页数:6
相关论文
共 50 条
  • [1] The impact and opportunities of large language models like ChatGPT in oral and maxillofacial surgery: a narrative review
    Puladi, B.
    Gsaxner, C.
    Kleesiek, J.
    Hoelzle, F.
    Roehrig, R.
    Egger, J.
    INTERNATIONAL JOURNAL OF ORAL AND MAXILLOFACIAL SURGERY, 2024, 53 (01) : 78 - 88
  • [2] Comment on "The impact and opportunities of large language models like ChatGPT in oral and maxillofacial surgery: a narrative review"
    Daungsupawong, H.
    Wiwanitkit, V.
    INTERNATIONAL JOURNAL OF ORAL AND MAXILLOFACIAL SURGERY, 2025, 54 (01) : 93 - 93
  • [3] Performance of Three Large Language Models on Dermatology Board Examinations
    Mirza, Fatima N.
    Lim, Rachel K.
    Yumeen, Sara
    Wahood, Samer
    Zaidat, Bashar
    Shah, Asghar
    Tang, Oliver Y.
    Kawaoka, John
    Seo, Su-Jean
    Dimarco, Christopher
    Muglia, Jennie
    Goldbach, Hayley S.
    Wisco, Oliver
    Qureshi, Abrar A.
    Libby, Tiffany J.
    JOURNAL OF INVESTIGATIVE DERMATOLOGY, 2024, 144 (02) : 398 - 400
  • [4] ScholarGPT's performance in oral and maxillofacial surgery
    Balel, Yunus
    JOURNAL OF STOMATOLOGY ORAL AND MAXILLOFACIAL SURGERY, 2025, 126 (04)
  • [5] Oral and maxillofacial surgery
    O'Leary, Eibhlin
    BRITISH JOURNAL OF GENERAL PRACTICE, 2011, 61 (586): : 326 - 326
  • [6] ORAL AND MAXILLOFACIAL SURGERY
    COLE, OR
    MEDICAL JOURNAL OF AUSTRALIA, 1988, 149 (06) : 340 - 340
  • [7] Oral and Maxillofacial Surgery
    Block, Michael S.
    INTERNATIONAL JOURNAL OF ORAL & MAXILLOFACIAL IMPLANTS, 2011, 26 : 107 - 108
  • [8] ORAL AND MAXILLOFACIAL SURGERY
    LEVANT, BA
    MEDICAL JOURNAL OF AUSTRALIA, 1989, 150 (02) : 112 - 112
  • [9] ORAL AND MAXILLOFACIAL SURGERY
    GIBSON, EW
    MEDICAL JOURNAL OF AUSTRALIA, 1988, 149 (01) : 50 - 50