Validity Arguments for Automated Essay Scoring of Young Students' Writing Traits

被引:3
|
作者
Hannah, L. [1 ,2 ]
Jang, E. E. [1 ]
Shah, M. [1 ]
Gupta, V. [1 ]
机构
[1] Univ Toronto, Toronto, ON, Canada
[2] Univ Toronto, Ontario Inst Studies Educ, Dept Appl Psychol & Human Dev, Toronto, ON, Canada
关键词
ENGLISH-LANGUAGE LEARNERS; TEXT;
D O I
10.1080/15434303.2023.2288253
中图分类号
G44 [教育心理学];
学科分类号
0402 ; 040202 ;
摘要
Machines have a long-demonstrated ability to find statistical relationships between qualities of texts and surface-level linguistic indicators of writing. More recently, unlocked by artificial intelligence, the potential of using machines to identify content-related writing trait criteria has been uncovered. This development is significant, especially in formative assessment contexts where feedback is key. Yet the extent to which writing traits can be validly scored by machines remains under-researched, especially in the K-12 context. The present study investigated the validity of machine learning (ML) models designed for students in grades 3-6 to score three writing traits: task fulfillment, organization and coherence, and vocabulary and expression. The study utilized an argument-based approach, focusing on two primary inferences: evaluation and explanation. The evaluation inference investigated human-machine score alignment, the ability for the models to detect off-topic and gibberish responses, and the consistency of human-machine score alignment across grades and language backgrounds. The explanation inference investigated the relevance of features used in the models. Results indicated that human-machine score alignment was sufficient for all writing traits; however, validity concerns were raised regarding the models' performances detecting off-topic and gibberish responses and the consistency across sub-groups. Implications for language assessment professionals and other educators were discussed.
引用
收藏
页码:399 / 420
页数:22
相关论文
共 50 条
  • [1] Validity Arguments for AI-Based Automated Scores: Essay Scoring as an Illustration
    Ferrara, Steve
    Qunbar, Saed
    JOURNAL OF EDUCATIONAL MEASUREMENT, 2022, 59 (03) : 288 - 313
  • [2] Anchoring Validity Evidence for Automated Essay Scoring
    Shermis, Mark D.
    JOURNAL OF EDUCATIONAL MEASUREMENT, 2022, 59 (03) : 314 - 337
  • [3] Automated Essay Scoring by Capturing Relative Writing Quality
    Chen, Hongbo
    Xu, Jungang
    He, Ben
    COMPUTER JOURNAL, 2014, 57 (09): : 1318 - 1330
  • [4] Validating automated essay scoring for online writing placement
    Ramineni, Chaitanya
    ASSESSING WRITING, 2013, 18 (01) : 40 - 61
  • [5] Automated Cross-prompt Scoring of Essay Traits
    Ridley, Robert
    He, Liang
    Dai, Xin-yu
    Huang, Shujian
    Chen, Jiajun
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 13745 - 13753
  • [7] Large language models and automated essay scoring of English language learner writing: Insights into validity and reliability
    Pack A.
    Barrett A.
    Escalante J.
    Computers and Education: Artificial Intelligence, 2024, 6
  • [8] On Automated Online Essay Scoring in Chinese EFL Writing Evaluation
    Yuan, Xi-ming
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND SOFTWARE ENGINEERING (AISE 2014), 2014, : 527 - 530
  • [9] Validity arguments for diagnostic assessment using automated writing evaluation
    Chapelle, Carol A.
    Cotos, Elena
    Lee, Jooyoung
    LANGUAGE TESTING, 2015, 32 (03) : 385 - 405
  • [10] ChatGPT as an automated essay scoring tool in the writing classrooms: how it compares with human scoring
    Bui, Ngoc My
    Barrot, Jessie S.
    EDUCATION AND INFORMATION TECHNOLOGIES, 2024, : 2041 - 2058