Can a large language model create acceptable dental board-style examination questions? A cross-sectional prospective study

被引:0
|
作者
Kim, Hak-Sun [1 ]
Kim, Gyu-Tae [2 ]
机构
[1] Kyung Hee Univ, Dept Oral & Maxillofacial Radiol, Dent Hosp, Seoul, South Korea
[2] Kyung Hee Univ, Coll Dent, Dept Oral & Maxillofacial Surg, 26 Kyungheedae Ro, Seoul 02447, South Korea
关键词
Dental education; Examination questions; Professional competence; Artificial intelligence; Natural language processing;
D O I
10.1016/j.jds.2024.08.020
中图分类号
R78 [口腔科学];
学科分类号
1003 ;
摘要
Background/purpose: Numerous studies have shown that large language models (LLMs) can score above the passing grade on various board examinations. Therefore, this study aimed to evaluate national dental board-style examination questions created by an LLM versus those created by human experts using item analysis. Materials and methods: This study was conducted in June 2024 and included senior dental students (n = 30) who participated voluntarily. An LLM, ChatGPT 4o, was used to generate 44 national dental board-style examination questions based on textbook content. Twenty questions for the LLM set were randomly selected after removing false questions. Two experts created another set of 20 questions based on the same content and in the same style as the LLM. Participating students simultaneously answered a total of 40 questions divided into two sets using Google Forms in the classroom. The responses were analyzed to assess difficulty, discrimination index, and distractor efficiency. Statistical comparisons were performed using the Wilcoxon signed rank test or linear-by-linear association test, with a confidence level of 95%. Results: The response rate was 100%. The median difficulty indices of the LLM and human set were 55.00% and 50.00%, both within the range of "excellent" range. The median discrimination indices were 0.29 for the LLM set and 0.14 for the human set. Both sets had a median distractor efficiency of 80.00%. The differences in all criteria were not statistically significant (P > 0.050). Conclusion: The LLM can create national board-style examination questions of equivalent quality to those created by human experts. (c) 2025 Association for Dental Sciences of the Republic of China. Publishing services by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons. org/licenses/by-nc-nd/4.0/).
引用
收藏
页码:895 / 900
页数:6
相关论文
共 50 条
  • [1] Artificial Intelligence for Anesthesiology Board-Style Examination Questions: Role of Large Language Models
    Khan, Adnan A.
    Yunus, Rayaan
    Sohail, Mahad
    Rehman, Taha A.
    Saeed, Shirin
    Bu, Yifan
    Jackson, Cullen D.
    Sharkey, Aidan
    Mahmood, Feroze
    Matyal, Robina
    JOURNAL OF CARDIOTHORACIC AND VASCULAR ANESTHESIA, 2024, 38 (05) : 1251 - 1259
  • [2] Performance of Large Language Models on a Neurology Board-Style Examination
    Schubert, Marc Cicero
    Wick, Wolfgang
    Venkataramani, Varun
    JAMA NETWORK OPEN, 2023, 6 (12) : E2346721
  • [3] Performance of Generative Large Language Models on Ophthalmology Board-Style Questions
    Cai, Louis Z.
    Shaheen, Abdulla
    Jin, Andrew
    Fukui, Riya
    Yi, Jonathan S.
    Yannuzzi, Nicolas
    Alabiad, Chrisfouad
    AMERICAN JOURNAL OF OPHTHALMOLOGY, 2023, 254 : 141 - 149
  • [4] Llama 3 Challenges Proprietary State-of-the-Art Large Language Models in Radiology Board-style Examination Questions
    Adams, Lisa C.
    Truhn, Daniel
    Busch, Felix
    Dorfner, Felix
    Nawabi, Jawed
    Makowski, Marcus R.
    Bressem, Keno K.
    RADIOLOGY, 2024, 312 (02)
  • [5] Performance of Publicly Available Large Language Models on Internal Medicine Board-style Questions
    Tarabanis, Constantine
    Zahid, Sohail
    Mamalis, Marios
    Zhang, Kevin
    Kalampokis, Evangelos
    Jankelson, Lior
    PLOS DIGITAL HEALTH, 2024, 3 (09):
  • [6] Large Language Models as Tools to Generate Radiology Board-Style Multiple-Choice Questions
    Mistry, Neel P.
    Saeed, Huzaifa
    Rafique, Sidra
    Le, Thuy
    Obaid, Haron
    Adams, Scott J.
    ACADEMIC RADIOLOGY, 2024, 31 (09) : 3872 - 3878
  • [7] Accuracy of large language models in answering ophthalmology board-style questions: A meta-analysis
    Wu, Jo-Hsuan
    Nishida, Takashi
    Liu, T. Y. Alvin
    ASIA-PACIFIC JOURNAL OF OPHTHALMOLOGY, 2024, 13 (05):
  • [8] Performance of large language models on a neurology board-style examination (vol 6, e2346721, 2023)
    Schubert, M. C.
    Wick, W.
    JAMA NETWORK OPEN, 2024, 7 (01)
  • [9] The performance of artificial intelligence language models in board-style dental knowledge assessment A preliminary study on ChatGPT
    Danesh, Arman
    Pazouki, Hirad
    Danesh, Kasra
    Danesh, Farzad
    Danesh, Arsalan
    JOURNAL OF THE AMERICAN DENTAL ASSOCIATION, 2023, 154 (11): : 970 - 974
  • [10] Artificial Intelligence Showdown in Gastroenterology: A Comparative Analysis of Large Language Models (LLMs) in Tackling Board-Style Review Questions
    Shah, Kevin P.
    Dey, Shirin A.
    Pothula, Shravya
    Abud, Arnold
    Jain, Sukrit
    Srivastava, Aniruddha
    Dommaraju, Sagar
    Komanduri, Srinadh
    AMERICAN JOURNAL OF GASTROENTEROLOGY, 2024, 119 (10S): : S1567 - S1568