Reliability of ChatGPT in automated essay scoring for dental undergraduate examinations

被引:5
|
作者
Quah, Bernadette [1 ,2 ]
Zheng, Lei [1 ,2 ]
Sng, Timothy Jie Han [1 ,2 ]
Yong, Chee Weng [1 ,2 ]
Islam, Intekhab [1 ,2 ]
机构
[1] Natl Univ Singapore, Fac Dent, Singapore, Singapore
[2] Natl Univ Ctr Oral Hlth, Discipline Oral & Maxillofacial Surg, 9 Lower Kent Ridge Rd, Singapore, Singapore
关键词
Artificial intelligence; Education; Dental; Academic performance; Models; Educational; Mentoring; Educational needs assessment; MEDICAL-EDUCATION;
D O I
10.1186/s12909-024-05881-6
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
Background This study aimed to answer the research question: How reliable is ChatGPT in automated essay scoring (AES) for oral and maxillofacial surgery (OMS) examinations for dental undergraduate students compared to human assessors? Methods Sixty-nine undergraduate dental students participated in a closed-book examination comprising two essays at the National University of Singapore. Using pre-created assessment rubrics, three assessors independently performed manual essay scoring, while one separate assessor performed AES using ChatGPT (GPT-4). Data analyses were performed using the intraclass correlation coefficient and Cronbach's alpha to evaluate the reliability and inter-rater agreement of the test scores among all assessors. The mean scores of manual versus automated scoring were evaluated for similarity and correlations. Results A strong correlation was observed for Question 1 (r = 0.752-0.848, p < 0.001) and a moderate correlation was observed between AES and all manual scorers for Question 2 (r = 0.527-0.571, p < 0.001). Intraclass correlation coefficients of 0.794-0.858 indicated excellent inter-rater agreement, and Cronbach's alpha of 0.881-0.932 indicated high reliability. For Question 1, the mean AES scores were similar to those for manual scoring (p > 0.05), and there was a strong correlation between AES and manual scores (r = 0.829, p < 0.001). For Question 2, AES scores were significantly lower than manual scores (p < 0.001), and there was a moderate correlation between AES and manual scores (r = 0.599, p < 0.001). Conclusion This study shows the potential of ChatGPT for essay marking. However, an appropriate rubric design is essential for optimal reliability. With further validation, the ChatGPT has the potential to aid students in self-assessment or large-scale marking automated processes.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] An assessment of reliability of dental fluorosis examinations.
    Swango, P
    Leske, G
    Opima, P
    Green, E
    JOURNAL OF DENTAL RESEARCH, 1996, 75 : 134 - 134
  • [32] A New Automated Essay Scoring: Teaching Resource Program
    Wang, Jijun
    MECHANICAL ENGINEERING, MATERIALS SCIENCE AND CIVIL ENGINEERING, 2013, 274 : 608 - 611
  • [33] Enhanced hybrid neural network for automated essay scoring
    Li, Xia
    Yang, Huali
    Hu, Shengze
    Geng, Jing
    Lin, Keke
    Li, Yuhai
    EXPERT SYSTEMS, 2022, 39 (10)
  • [34] Automated Essay Scoring Using Set of Literary Sememes
    Chang, Tao-Hsing
    Lee, Chia-Hoang
    Tsai, Pei-Yen
    Tam, Hak-Ping
    IEEE NLP-KE 2008: PROCEEDINGS OF INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, 2008, : 403 - +
  • [35] Automated Essay Scoring Using Set of Literary Sememes
    Chang, Tao-Hsing
    Tsai, Pei-Yen
    Lee, Chia-Hoang
    Tam, Hak-Ping
    INFORMATION-AN INTERNATIONAL INTERDISCIPLINARY JOURNAL, 2009, 12 (02): : 351 - 357
  • [36] Automated Essay Scoring System for Nonnative Japanese Learners
    Hirao, Reo
    Arai, Mio
    Shimanaka, Hiroki
    Katsumata, Satoru
    Komachi, Mamoru
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 1250 - 1257
  • [37] Automated Essay Scoring in a High Stakes Testing Environment
    Shermis, Mark D.
    INNOVATIVE ASSESSMENT FOR THE 21ST CENTURY: SUPPORTING EDUCATIONAL NEEDS, 2010, : 167 - 185
  • [38] Automated essay scoring: A cross-disciplinary perspective
    Kelly, PA
    APPLIED PSYCHOLOGICAL MEASUREMENT, 2006, 30 (01) : 66 - 68
  • [39] Improving Automated Essay Scoring by Prompt Prediction and Matching
    Sun, Jingbo
    Song, Tianbao
    Song, Jihua
    Peng, Weiming
    ENTROPY, 2022, 24 (09)
  • [40] Automated Essay Scoring by Capturing Relative Writing Quality
    Chen, Hongbo
    Xu, Jungang
    He, Ben
    COMPUTER JOURNAL, 2014, 57 (09): : 1318 - 1330