Can a large language model create acceptable dental board-style examination questions? A cross-sectional prospective study

被引:0
|
作者
Kim, Hak-Sun [1 ]
Kim, Gyu-Tae [2 ]
机构
[1] Kyung Hee Univ, Dept Oral & Maxillofacial Radiol, Dent Hosp, Seoul, South Korea
[2] Kyung Hee Univ, Coll Dent, Dept Oral & Maxillofacial Surg, 26 Kyungheedae Ro, Seoul 02447, South Korea
关键词
Dental education; Examination questions; Professional competence; Artificial intelligence; Natural language processing;
D O I
10.1016/j.jds.2024.08.020
中图分类号
R78 [口腔科学];
学科分类号
1003 ;
摘要
Background/purpose: Numerous studies have shown that large language models (LLMs) can score above the passing grade on various board examinations. Therefore, this study aimed to evaluate national dental board-style examination questions created by an LLM versus those created by human experts using item analysis. Materials and methods: This study was conducted in June 2024 and included senior dental students (n = 30) who participated voluntarily. An LLM, ChatGPT 4o, was used to generate 44 national dental board-style examination questions based on textbook content. Twenty questions for the LLM set were randomly selected after removing false questions. Two experts created another set of 20 questions based on the same content and in the same style as the LLM. Participating students simultaneously answered a total of 40 questions divided into two sets using Google Forms in the classroom. The responses were analyzed to assess difficulty, discrimination index, and distractor efficiency. Statistical comparisons were performed using the Wilcoxon signed rank test or linear-by-linear association test, with a confidence level of 95%. Results: The response rate was 100%. The median difficulty indices of the LLM and human set were 55.00% and 50.00%, both within the range of "excellent" range. The median discrimination indices were 0.29 for the LLM set and 0.14 for the human set. Both sets had a median distractor efficiency of 80.00%. The differences in all criteria were not statistically significant (P > 0.050). Conclusion: The LLM can create national board-style examination questions of equivalent quality to those created by human experts. (c) 2025 Association for Dental Sciences of the Republic of China. Publishing services by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons. org/licenses/by-nc-nd/4.0/).
引用
收藏
页码:895 / 900
页数:6
相关论文
共 50 条
  • [31] Quality of life and depression in Wilson's disease: a large prospective cross-sectional study
    Chevalier, Kevin
    Rahli, Djamila
    de Veyrac, Louise
    Guillaume, Jessica
    Obadia, Michael Alexandre
    Poujois, Aurelia
    ORPHANET JOURNAL OF RARE DISEASES, 2023, 18 (01)
  • [32] Quality of life and depression in Wilson’s disease: a large prospective cross-sectional study
    Kevin Chevalier
    Djamila Rahli
    Louise de Veyrac
    Jessica Guillaume
    Michaël Alexandre Obadia
    Aurélia Poujois
    Orphanet Journal of Rare Diseases, 18
  • [33] Major Factors Causing Examination Anxiety in Undergraduate Dental Students - A Questionnaire Based Cross-Sectional Study
    Al-Sahman, Lujain Abdulrhman
    Al-Sahman, Roba Abdulrhman
    Joseph, Betsy
    Javali, Mukhatar Ahmed
    ANNALS OF MEDICAL AND HEALTH SCIENCES RESEARCH, 2019, 9 (06) : 691 - 694
  • [34] The Accuracy of the Multimodal Large Language Model GPT-4 on Sample Questions From the Interventional Radiology Board Examination Response
    Ariyaratne, Sisith
    Jenko, Nathan
    Davies, A. Mark
    Iyengar, Karthikeyan P.
    Botchu, Rajesh
    ACADEMIC RADIOLOGY, 2024, 31 (08) : 3477 - 3477
  • [35] EXPLORING THE RELATIONSHIP BETWEEN LEADERSHIP STYLE AND SAFETY CLIMATE IN A LARGE SCALE DANISH CROSS-SECTIONAL STUDY
    Sonderstrup-Andersen, Hans H. K.
    Carlsen, Kathrine
    Kines, Pete
    Bjorner, Jakob B.
    Roepstorff, Christian
    SAFETY SCIENCE MONITOR, 2011, 15 (01):
  • [36] The reliability of freely accessible, baseline, general-purpose large language model generated patient information for frequently asked questions on liver disease: a preliminary cross-sectional study
    Niriella, Madunil A.
    Premaratna, Pathum
    Senanayake, Mananjala
    Kodisinghe, Senerath
    Dassanayake, Uditha
    Dassanayake, Anuradha
    Ediriweera, Dileepa S.
    de Silva, H. Janaka
    EXPERT REVIEW OF GASTROENTEROLOGY & HEPATOLOGY, 2025,
  • [37] Physician Versus Large Language Model Chatbot Responses to Web-Based Questions From Autistic Patients in Chinese: Cross-Sectional Comparative Analysis
    He, Wenjie
    Zhang, Wenyan
    Jin, Ya
    Zhou, Qiang
    Zhang, Huadan
    Xia, Qing
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2024, 26
  • [38] Evaluating Large Language Models for Automated Reporting and Data Systems Categorization: Cross-Sectional Study
    Wu, Qingxia
    Li, Huali
    Wang, Yan
    Bai, Yan
    Wu, Yaping
    Yu, Xuan
    Li, Xiaodong
    Dong, Pei
    Xue, Jon
    Shen, Dinggang
    Wang, Meiyun
    JMIR MEDICAL INFORMATICS, 2024, 12
  • [39] Can Large Language Models Aid Caregivers of Pediatric Cancer Patients in Information Seeking? A Cross-Sectional Investigation
    Sezgin, Emre
    Jackson, Daniel I.
    Kocaballi, A. Baki
    Bibart, Mindy
    Zupanec, Sue
    Landier, Wendy
    Audino, Anthony
    Ranalli, Mark
    Skeens, Micah
    CANCER MEDICINE, 2025, 14 (01):
  • [40] Prospective Evaluation of Dental Practitioners' Knowledge, Attitude, and Practice Toward Adult Dental Pain Management: A Cross-Sectional Multicenter Study
    Shukla, Kirti
    Pebbili, Kranthi Kiran
    Bhagat, Seema, V
    Rathod, Rahul
    Kotak, Bhavesh P.
    CUREUS JOURNAL OF MEDICAL SCIENCE, 2024, 16 (03)