Reliability of ChatGPT in automated essay scoring for dental undergraduate examinations

被引:5
|
作者
Quah, Bernadette [1 ,2 ]
Zheng, Lei [1 ,2 ]
Sng, Timothy Jie Han [1 ,2 ]
Yong, Chee Weng [1 ,2 ]
Islam, Intekhab [1 ,2 ]
机构
[1] Natl Univ Singapore, Fac Dent, Singapore, Singapore
[2] Natl Univ Ctr Oral Hlth, Discipline Oral & Maxillofacial Surg, 9 Lower Kent Ridge Rd, Singapore, Singapore
关键词
Artificial intelligence; Education; Dental; Academic performance; Models; Educational; Mentoring; Educational needs assessment; MEDICAL-EDUCATION;
D O I
10.1186/s12909-024-05881-6
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
Background This study aimed to answer the research question: How reliable is ChatGPT in automated essay scoring (AES) for oral and maxillofacial surgery (OMS) examinations for dental undergraduate students compared to human assessors? Methods Sixty-nine undergraduate dental students participated in a closed-book examination comprising two essays at the National University of Singapore. Using pre-created assessment rubrics, three assessors independently performed manual essay scoring, while one separate assessor performed AES using ChatGPT (GPT-4). Data analyses were performed using the intraclass correlation coefficient and Cronbach's alpha to evaluate the reliability and inter-rater agreement of the test scores among all assessors. The mean scores of manual versus automated scoring were evaluated for similarity and correlations. Results A strong correlation was observed for Question 1 (r = 0.752-0.848, p < 0.001) and a moderate correlation was observed between AES and all manual scorers for Question 2 (r = 0.527-0.571, p < 0.001). Intraclass correlation coefficients of 0.794-0.858 indicated excellent inter-rater agreement, and Cronbach's alpha of 0.881-0.932 indicated high reliability. For Question 1, the mean AES scores were similar to those for manual scoring (p > 0.05), and there was a strong correlation between AES and manual scores (r = 0.829, p < 0.001). For Question 2, AES scores were significantly lower than manual scores (p < 0.001), and there was a moderate correlation between AES and manual scores (r = 0.599, p < 0.001). Conclusion This study shows the potential of ChatGPT for essay marking. However, an appropriate rubric design is essential for optimal reliability. With further validation, the ChatGPT has the potential to aid students in self-assessment or large-scale marking automated processes.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] An automated essay scoring systems: a systematic literature review
    Dadi Ramesh
    Suresh Kumar Sanampudi
    Artificial Intelligence Review, 2022, 55 : 2495 - 2527
  • [42] Automated Essay Scoring: Recent Successes and Future Directions
    Li, Shengjie
    Ng, Vincent
    PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 8114 - 8122
  • [43] Automated language essay scoring systems: a literature review
    Hussein, Mohamed Abdellatif
    Hassan, Hesham
    Nassef, Mohammad
    PEERJ COMPUTER SCIENCE, 2019, 2019 (08)
  • [44] Automated essay scoring with string kernels and word embeddings
    Cozma, Madalina
    Butnaru, Andrei M.
    Ionescu, Radu Tudor
    PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2, 2018, : 503 - 509
  • [45] Automated essay scoring: A cross-disciplinary perspective
    Caryl, PG
    BRITISH JOURNAL OF MATHEMATICAL & STATISTICAL PSYCHOLOGY, 2004, 57 : 378 - 379
  • [46] Neural Automated Essay Scoring Considering Logical Structure
    Yamaura, Misato
    Fukuda, Itsuki
    Uto, Masaki
    ARTIFICIAL INTELLIGENCE IN EDUCATION, AIED 2023, 2023, 13916 : 267 - 278
  • [47] An Automated Essay Scoring model Based on Stacking Method
    Li, Chenchen
    Lin, Lin
    Mao, Wei
    Xiong, Liu
    Lin, Yongping
    2022 2ND IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND ARTIFICIAL INTELLIGENCE (SEAI 2022), 2022, : 248 - 252
  • [48] An Empirical Analysis of BERT Embedding for Automated Essay Scoring
    Beseiso, Majdi
    Alzahrani, Saleh
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (10) : 204 - 210
  • [49] An automated essay scoring systems: a systematic literature review
    Ramesh, Dadi
    Sanampudi, Suresh Kumar
    ARTIFICIAL INTELLIGENCE REVIEW, 2022, 55 (03) : 2495 - 2527
  • [50] Evaluating the Detection of Aberrant Responses in Automated Essay Scoring
    Zhang, Mo
    Chen, Jing
    Ruan, Chunyi
    QUANTITATIVE PSYCHOLOGY RESEARCH, 2015, 140 : 191 - 208