Large language models and automated essay scoring of English language learner writing: Insights into validity and reliability

被引：0

作者：

Pack A. ^{[1
]}

Barrett A. ^{[2
]}

Escalante J. ^{[1
]}

机构：

[1] Faculty of Education and Social Work, Brigham Young University-Hawaii, 55-220 Kulanui Street Bldg 5, Laie, 96762-1293, HI

[2] College of Education, Florida State University, Stone Building, 114 West Call Street, Tallahassee, 32306-2400, FL

来源：

Computers and Education: Artificial Intelligence | 2024年 / 6卷

关键词：

Artificial intelligence; Automatic essay scoring; Automatic writing evaluation; ChatGPT; Generative AI; Large language model;

D O I：

10.1016/j.caeai.2024.100234

中图分类号：

学科分类号：

摘要：

Advancements in generative AI, such as large language models (LLMs), may serve as a potential solution to the burdensome task of essay grading often faced by language education teachers. Yet, the validity and reliability of leveraging LLMs for automatic essay scoring (AES) in language education is not well understood. To address this, we evaluated the cross-sectional and longitudinal validity and reliability of four prominent LLMs, Google's PaLM 2, Anthropic's Claude 2, and OpenAI's GPT-3.5 and GPT-4, for the AES of English language learners' writing. 119 essays taken from an English language placement test were assessed twice by each LLM, on two separate occasions, as well as by a pair of human raters. GPT-4 performed the best, demonstrating excellent intrarater reliability and good validity. All models, with the exception of GPT-3.5, improved over time in their intrarater reliability. The interrater reliability of GPT-3.5 and GPT-4, however, decreased slightly over time. These findings indicate that some models perform better than others in AES and that all models are subject to fluctuations in their performance. We discuss potential reasons for such variability, and offer suggestions for prospective avenues of research. © 2024 The Authors

引用

共 50 条

[11] Utilizing large language models for EFL essay grading: An examination of reliability and validity in rubric-based assessments
Yavuz, Fatih
Celik, Ozgur
Celik, Gamze Yavas
BRITISH JOURNAL OF EDUCATIONAL TECHNOLOGY, 2025, 56 (01) : 150 - 166
[12] Frontiers: Determining the Validity of Large Language Models for Automated Perceptual Analysis
Li, Peiyao
Castelo, Noah
Katona, Zsolt
Sarvary, Miklos
MARKETING SCIENCE, 2024, 43 (02) : 254 - 266
[13] Reflections on the Automated Essay Scoring System of English Writing in Vocational Colleges
李恩亮
疯狂英语(双语世界), 2018, (04) : 29 - 30
[14] SHAPed Automated Essay Scoring: Explaining Writing Features' Contributions to English Writing Organization
Boulanger, David
Kumar, Vivekanandan
INTELLIGENT TUTORING SYSTEMS (ITS 2020), 2020, 12149 : 68 - 78
[15] Validity of automated essay scores for elementary-age English language learners: Evidence of bias?
Wilson, Joshua
Huang, Yue
ASSESSING WRITING, 2024, 60
[16] TCFLE-8: a Corpus of Learner Written Productions for French as a Foreign Language and its Application to Automated Essay Scoring
Wilkens, Rodrigo
Pintard, Alice
Alfter, David
Folny, Vincent
Francois, Thomas
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 3447 - 3465
[17] Lost in translation? Not for Large Language Models: Automated divergent thinking scoring performance translates to non-English contexts
Zielinska, Aleksandra
Organisciak, Peter
Dumas, Denis
Karwowski, Maciej
THINKING SKILLS AND CREATIVITY, 2023, 50
[18] Teaching of English Writing Based on Automated Essay Scoring - Take Juku as an Example
Yang, Shufang
2017 4TH INTERNATIONAL CONFERENCE ON EDUCATION REFORM AND MANAGEMENT INNOVATION (ERMI 2017), 2017, 96 : 96 - 99
[19] A Study on the Application of Automated Essay Scoring in College English Writing Based on Pigai
Zhu, Wenxin
PROCEEDINGS OF THE 2019 5TH INTERNATIONAL CONFERENCE ON SOCIAL SCIENCE AND HIGHER EDUCATION (ICSSHE 2019), 2019, 336 : 768 - 771
[20] Using Large Language Models for Automated Grading of Student Writing about Science
Impey, Chris
Wenger, Matthew
Garuda, Nikhil
Golchin, Shahriar
Stamer, Sarah
INTERNATIONAL JOURNAL OF ARTIFICIAL INTELLIGENCE IN EDUCATION, 2025,

← 1 2 3 4 5 →