Reliability of ChatGPT in automated essay scoring for dental undergraduate examinations

被引:5
|
作者
Quah, Bernadette [1 ,2 ]
Zheng, Lei [1 ,2 ]
Sng, Timothy Jie Han [1 ,2 ]
Yong, Chee Weng [1 ,2 ]
Islam, Intekhab [1 ,2 ]
机构
[1] Natl Univ Singapore, Fac Dent, Singapore, Singapore
[2] Natl Univ Ctr Oral Hlth, Discipline Oral & Maxillofacial Surg, 9 Lower Kent Ridge Rd, Singapore, Singapore
关键词
Artificial intelligence; Education; Dental; Academic performance; Models; Educational; Mentoring; Educational needs assessment; MEDICAL-EDUCATION;
D O I
10.1186/s12909-024-05881-6
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
Background This study aimed to answer the research question: How reliable is ChatGPT in automated essay scoring (AES) for oral and maxillofacial surgery (OMS) examinations for dental undergraduate students compared to human assessors? Methods Sixty-nine undergraduate dental students participated in a closed-book examination comprising two essays at the National University of Singapore. Using pre-created assessment rubrics, three assessors independently performed manual essay scoring, while one separate assessor performed AES using ChatGPT (GPT-4). Data analyses were performed using the intraclass correlation coefficient and Cronbach's alpha to evaluate the reliability and inter-rater agreement of the test scores among all assessors. The mean scores of manual versus automated scoring were evaluated for similarity and correlations. Results A strong correlation was observed for Question 1 (r = 0.752-0.848, p < 0.001) and a moderate correlation was observed between AES and all manual scorers for Question 2 (r = 0.527-0.571, p < 0.001). Intraclass correlation coefficients of 0.794-0.858 indicated excellent inter-rater agreement, and Cronbach's alpha of 0.881-0.932 indicated high reliability. For Question 1, the mean AES scores were similar to those for manual scoring (p > 0.05), and there was a strong correlation between AES and manual scores (r = 0.829, p < 0.001). For Question 2, AES scores were significantly lower than manual scores (p < 0.001), and there was a moderate correlation between AES and manual scores (r = 0.599, p < 0.001). Conclusion This study shows the potential of ChatGPT for essay marking. However, an appropriate rubric design is essential for optimal reliability. With further validation, the ChatGPT has the potential to aid students in self-assessment or large-scale marking automated processes.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Human versus machine: The effectiveness of ChatGPT in automated essay scoring
    Manning, Jennifer
    Baldwin, Jeffrey
    Powell, Natasha
    INNOVATIONS IN EDUCATION AND TEACHING INTERNATIONAL, 2025,
  • [2] Artificial intelligence as an automated essay scoring tool: A focus on ChatGPT
    Uyar, Ahmet Can
    Buyukahiska, Dilek
    INTERNATIONAL JOURNAL OF ASSESSMENT TOOLS IN EDUCATION, 2025, 12 (01): : 20 - 32
  • [3] ChatGPT as an automated essay scoring tool in the writing classrooms: how it compares with human scoring
    Bui, Ngoc My
    Barrot, Jessie S.
    EDUCATION AND INFORMATION TECHNOLOGIES, 2025, 30 (02) : 2041 - 2058
  • [4] Automated essay scoring (AES) of constructed responses in nursing examinations: An evaluation
    Stephen, Tracey C.
    Gierl, Mark C.
    King, Sharla
    NURSE EDUCATION IN PRACTICE, 2021, 54
  • [5] Improving validity and reliability of integrated essay examinations in an undergraduate physiology course
    McCleary, VL
    Mosley, KM
    FASEB JOURNAL, 2005, 19 (04): : A226 - A226
  • [6] Reliability-Based Feature Weighting for Automated Essay Scoring
    Attali, Yigal
    APPLIED PSYCHOLOGICAL MEASUREMENT, 2015, 39 (04) : 303 - 313
  • [7] A Multilingual Application for Automated Essay Scoring
    Castro-Castro, Daniel
    Lannes-Losada, Rocio
    Maritxalar, Montse
    Niebla, Ianire
    Perez-Marques, Celia
    Alamo-Suarez, Nancy C.
    Pons-Porrata, Aurora
    ADVANCES IN ARTIFICIAL INTELLIGENCE - IBERAMIA 2008, PROCEEDINGS, 2008, 5290 : 243 - 251
  • [8] The Nature of Automated Essay Scoring Feedback
    Dikli, Semire
    CALICO JOURNAL, 2011, 28 (01): : 99 - 134
  • [9] Deep Learning in Automated Essay Scoring
    Boulanger, David
    Kumar, Vivekanandan
    INTELLIGENT TUTORING SYSTEMS, ITS 2018, 2018, 10858 : 294 - 299
  • [10] The Impact of Anonymization for Automated Essay Scoring
    Shermis, Mark D.
    Lottridge, Sue
    Mayfield, Elijah
    JOURNAL OF EDUCATIONAL MEASUREMENT, 2015, 52 (04) : 419 - 436