Performance of large language models in oral and maxillofacial surgery examinations

被引:2
|
作者
Quah, B. [1 ,2 ]
Yong, C. W. [1 ,2 ]
Lai, C. W. M. [1 ]
Islam, I. [1 ,2 ]
机构
[1] Natl Univ Singapore, Fac Dent, 9 Lower Kent Ridge Rd, Singapore 119085, Singapore
[2] Natl Univ Ctr Oral Hlth, Discipline Oral & Maxillofacial Surg, Singapore, Singapore
关键词
Artificial intelligence; Oral surgery; Dental education; Academic performance; Dentistry;
D O I
10.1016/j.ijom.2024.06.003
中图分类号
R78 [口腔科学];
学科分类号
1003 ;
摘要
This study aimed to determine the accuracy of large language models (LLMs) in answering oral and maxillofacial surgery (OMS) multiple choice questions. A total of 259 questions from the university's question bank were answered by the LLMs (GPT-3.5, GPT-4, Llama 2, Gemini, and Copilot). The scores per category as well as the total score out of 259 were recorded and evaluated, with the passing score set at 50%. The mean overall score amongst all LLMs was 62.5%. GPT-4 performed the best (76.8%, 95% confidence interval (CI) 71.4-82.2%), followed by Copilot (72.6%, 95% CI 67.2-78.0%), GPT-3.5 (62.2%, 95% CI 56.4-68.0%), Gemini (58.7%, 95% CI 52.9-64.5%), and Llama 2 (42.5%, 95% CI 37.1-48.6%). There was a statistically significant difference between the scores of the five LLMs overall (chi(2) = 79.9, df = 4, P < 0.001) and within all categories except 'basic sciences' (P = 0.129), 'dentoalveolar and implant surgery' (P = 0.052), and 'oral medicine/pathology/radiology' (P = 0.801). The LLMs performed best in 'basic sciences' (68.9%) and poorest in 'pharmacology' (45.9%). The LLMs can be used as adjuncts in teaching, but should not be used for clinical decision-making until the models are further developed and validated.
引用
收藏
页码:881 / 886
页数:6
相关论文
共 50 条
  • [21] Short-Answer Examinations Improve Student Performance in an Oral and Maxillofacial Pathology Course
    Pinckard, R. Neal
    McMahan, C. Alex
    Prihoda, Thomas J.
    Littlefield, John H.
    Jones, Anne Cale
    JOURNAL OF DENTAL EDUCATION, 2009, 73 (08) : 950 - 961
  • [22] Reassessing the Performance of Large Language Models in Oral Health Questionnaires: Interpretative Considerations
    Ardila, Carlos M.
    Yadalam, Pradeep Kumar
    INTERNATIONAL DENTAL JOURNAL, 2025, 75 (03) : 1564 - 1565
  • [23] Oral maxillofacial surgery resident, faculty and practitioner role models and dental students' interest in oral maxillofacial surgery careers: Does gender matter?
    Marti, Kyriaki C.
    Edwards, Sean P.
    Inglehart, Marita R.
    JOURNAL OF DENTAL EDUCATION, 2023, 87 (07) : 1022 - 1032
  • [24] Use of 3-D Stereolithographic Models in Oral and Maxillofacial Surgery
    Mehra P.
    Miner J.
    D’Innocenzo R.
    Nadershah M.
    Journal of Maxillofacial and Oral Surgery, 2011, 10 (1) : 6 - 13
  • [25] Breakthroughs in Oral and Maxillofacial Surgery
    Antonelli, Alessandro
    Bennardo, Francesco
    Giudice, Amerigo
    JOURNAL OF CLINICAL MEDICINE, 2024, 13 (03)
  • [26] Definition of oral and maxillofacial surgery
    Ding, AS
    ORAL SURGERY ORAL MEDICINE ORAL PATHOLOGY ORAL RADIOLOGY AND ENDODONTICS, 1999, 87 (04): : 395 - 396
  • [27] Training in Oral and Maxillofacial Surgery
    Rapaport, B.
    BRITISH JOURNAL OF SURGERY, 2019, 106 : 84 - 85
  • [28] Dracunculiasis in oral and maxillofacial surgery
    Kim, Soung Min
    JOURNAL OF THE KOREAN ASSOCIATION OF ORAL AND MAXILLOFACIAL SURGEONS, 2016, 42 (02) : 67 - 76
  • [29] RADIOLOGY AND ORAL AND MAXILLOFACIAL SURGERY
    PETERSON, LJ
    ORAL SURGERY ORAL MEDICINE ORAL PATHOLOGY ORAL RADIOLOGY AND ENDODONTICS, 1995, 80 (05): : 494 - 494
  • [30] FUTURE OF ORAL AND MAXILLOFACIAL SURGERY
    FRIEDMAN, E
    JOURNAL OF THE AMERICAN DENTAL ASSOCIATION, 1984, 108 (03): : 298 - 298