Who's the Best Detective? Large Language Models vs. Traditional Machine Learning in Detecting Incoherent Fourth Grade Math Answers

被引:5
|
作者
Urrutia, Felipe [1 ,3 ]
Araya, Roberto [2 ,4 ]
机构
[1] Univ Chile, Ctr Adv Res Educ, Santiago, Chile
[2] Univ Chile, Inst Educ, Santiago, Chile
[3] Univ Chile, Av Beauchef 851, Santiago 1025000, Region Metropol, Chile
[4] Inst Educ, Periodista Jose Carrasco Tapia 75, Santiago, Chile
关键词
detecting incoherent answers; fourth grade math; large language models; machine learning; open-ended questions; teacher review; automation; recursive questions; fourth-grade misspellings;
D O I
10.1177/07356331231191174
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
Written answers to open-ended questions can have a higher long-term effect on learning than multiple-choice questions. However, it is critical that teachers immediately review the answers, and ask to redo those that are incoherent. This can be a difficult task and can be time-consuming for teachers. A possible solution is to automate the detection of incoherent answers. One option is to automate the review with Large Language Models (LLM). They have a powerful discursive ability that can be used to explain decisions. In this paper, we analyze the responses of fourth graders in mathematics using three LLMs: GPT-3, BLOOM, and YOU. We used them with zero, one, two, three and four shots. We compared their performance with the results of various classifiers trained with Machine Learning (ML). We found that LLMs perform worse than MLs in detecting incoherent answers. The difficulty seems to reside in recursive questions that contain both questions and answers, and in responses from students with typical fourth-grader misspellings. Upon closer examination, we have found that the ChatGPT model faces the same challenges.
引用
收藏
页码:187 / 218
页数:32
相关论文
共 1 条
  • [1] Labeling Network Intrusion Detection System (NIDS) Rules with MITRE ATT&CK Techniques: Machine Learning vs. Large Language Models
    Daniel, Nir
    Kaiser, Florian Klaus
    Giladi, Shay
    Sharabi, Sapir
    Moyal, Raz
    Shpolyansky, Shalev
    Murillo, Andres
    Elyashar, Aviad
    Puzis, Rami
    BIG DATA AND COGNITIVE COMPUTING, 2025, 9 (02)