Evaluating Language Models for Generating and Judging Programming Feedback

被引：0

作者：

Koutcheme, Charles ^{[1
]}

Dainese, Nicola ^{[1
]}

Sarsa, Sami ^{[2
]}

Hellas, Arto ^{[1
]}

Leinonen, Juho ^{[1
]}

Ashraf, Syed ^{[1
]}

Denny, Paul ^{[3
]}

机构：

[1] Aalto Univ, Espoo, Finland

[2] Univ Jyvaskyla, Jyvaskyla, Finland

[3] Univ Auckland, Auckland, New Zealand

来源：

PROCEEDINGS OF THE 56TH ACM TECHNICAL SYMPOSIUM ON COMPUTER SCIENCE EDUCATION, SIGCSE TS 2025, VOL 2 | 2025年

关键词：

open source; large language models; generative AI; automatic feedback; automatic evaluation; programming feedback; LLM-as-a-judge;

D O I：

暂无

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

The emergence of large language models (LLMs) has transformed research and practice across a wide range of domains. Within the computing education research (CER) domain, LLMs have garnered significant attention, particularly in the context of learning programming. Much of the work on LLMs in CER, however, has focused on applying and evaluating proprietary models. In this article, we evaluate the efficiency of open-source LLMs in generating high-quality feedback for programming assignments and judging the quality of programming feedback, contrasting the results with proprietary models. Our evaluations on a dataset of students' submissions to introductory Python programming exercises suggest that state-of-the-art open-source LLMs are nearly on par with proprietary models in both generating and assessing programming feedback. Additionally, we demonstrate the efficiency of smaller LLMs in these tasks and highlight the wide range of LLMs accessible, even for free, to educators and practitioners.

引用

页码：624 / 630

页数：7

共 50 条

[1] Evaluating Language Models for Generating and Judging Programming Feedback
Koutcheme, Charles
Dainese, Nicola
Sarsa, Sami
Hellas, Arto
Leinonen, Juho
Ashraf, Syed
Denny, Paul
PROCEEDINGS OF THE 56TH ACM TECHNICAL SYMPOSIUM ON COMPUTER SCIENCE EDUCATION, SIGCSE TS 2025, VOL 1, 2025, : 624 - 630
[2] Evaluating the Application of Large Language Models to Generate Feedback in Programming Education
Jacobs, Sven
Jaschke, Steffen
2024 IEEE GLOBAL ENGINEERING EDUCATION CONFERENCE, EDUCON 2024, 2024,
[3] Propagating Large Language Models Programming Feedback
Koutcheme, Charles
Hellas, Arto
PROCEEDINGS OF THE ELEVENTH ACM CONFERENCE ON LEARNING@SCALE, L@S 2024, 2024, : 366 - 370
[4] Generating Automatic Feedback on UI Mockups with Large Language Models
Duan, Peitong
Warner, Jeremy
Li, Yang
Hartmann, Bjoern
PROCEEDINGS OF THE 2024 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYTEMS (CHI 2024), 2024,
[5] The study on C language programming judging system
Shan, Shuqian
Tian, Zhigang
Ren, Jiaxun
Chen, Jing
2015 3RD INTERNATIONAL CONFERENCE ON SOFT COMPUTING IN INFORMATION COMMUNICATION TECHNOLOGY (SCICT 2015), 2015, : 13 - 15
[6] BeGrading: large language models for enhanced feedback in programming education
Mina Yousef
Kareem Mohamed
Walaa Medhat
Ensaf Hussein Mohamed
Ghada Khoriba
Tamer Arafa
Neural Computing and Applications, 2025, 37 (2) : 1027 - 1040
[7] Large Language Models (GPT) for automating feedback on programming assignments
Pankiewicz, Maciej
Baker, Ryan S.
31ST INTERNATIONAL CONFERENCE ON COMPUTERS IN EDUCATION, ICCE 2023, VOL I, 2023, : 68 - 77
[8] Comparing Large Language Models and Human Programmers for Generating Programming Code
Hou, Wenpin
Ji, Zhicheng
ADVANCED SCIENCE, 2025, 12 (08)
[9] Evaluating the Ability of Large Language Models to Generate Motivational Feedback
Gaeta, Angelo
Orciuoli, Francesco
Pascuzzo, Antonella
Peduto, Angela
GENERATIVE INTELLIGENCE AND INTELLIGENT TUTORING SYSTEMS, PT I, ITS 2024, 2024, 14798 : 188 - 201
[10] Training Language Models for Programming Feedback Using Automated Repair Tools
Koutcheme, Charles
ARTIFICIAL INTELLIGENCE IN EDUCATION, AIED 2023, 2023, 13916 : 830 - 835

← 1 2 3 4 5 →