Feedback-Generation for Programming Exercises With GPT-4

被引：6

作者：

Azaiz, Imen ^{[1
]}

Kiesler, Natalie ^{[2
]}

Strickroth, Sven ^{[1
]}

机构：

[1] Ludwig Maximilians Univ Munchen, Munich, Germany

[2] Nuremberg Tech, Nurnberg, Germany

来源：

PROCEEDINGS OF THE 2024 CONFERENCE INNOVATION AND TECHNOLOGY IN COMPUTER SCIENCE EDUCATION, VOL 1, ITICSE 2024 | 2024年

关键词：

formative feedback; personalized feedback; assessment; introductory programming; Large Language Models; LLMs; GPT-4; Turbo; benchmarking;

D O I：

10.1145/3649217.3653594

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Ever since Large Language Models (LLMs) and related applications have become broadly available, several studies investigated their potential for assisting educators and supporting students in higher education. LLMs such as Codex, GPT-3.5, and GPT 4 have shown promising results in the context of large programming courses, where students can benefit from feedback and hints if provided timely and at scale. This paper explores the quality of GPT-4 Turbo's generated output for prompts containing both the programming task specification and a student's submission as input. Two assignments from an introductory programming course were selected, and GPT-4 was asked to generate feedback for 55 randomly chosen, authentic student programming submissions. The output was qualitatively analyzed regarding correctness, personalization, fault localization, and other features identified in the material. Compared to prior work and analyses of GPT-3.5, GPT-4 Turbo shows notable improvements. For example, the output is more structured and consistent. GPT-4 Turbo can also accurately identify invalid casing in student programs' output. In some cases, the feedback also includes the output of the student program. At the same time, inconsistent feedback was noted such as stating that the submission is correct but an error needs to be fixed. The present work increases our understanding of LLMs' potential, limitations, and how to integrate them into e-assessment systems, pedagogical scenarios, and instructing students who are using applications based on GPT-4.

引用

页码：31 / 37

页数：7

共 50 条

[31] Investigating the Perception of the Future in GPT-3,-3.5 and GPT-4
Kozachek, Diana
2023 PROCEEDINGS OF THE 15TH CONFERENCE ON CREATIVITY AND COGNITION, C&C 2023, 2023, : 282 - 287
[32] Text understanding in GPT-4 versus humans
Shultz, Thomas R.
Wise, Jamie M.
Nobandegani, Ardavan S.
ROYAL SOCIETY OPEN SCIENCE, 2025, 12 (02):
[33] GPT-4: The Future of Cosmetic Procedure Consultation?
Sun, Yi-Xin
Li, Zi-Ming
Huang, Jiu-Zuo
Yu, Nan-ze
Long, Xiao
AESTHETIC SURGERY JOURNAL, 2023, 43 (08) : NP670 - NP672
[34] GPT-4 is here: what scientists think
Sanderson, Katharine
NATURE, 2023, 615 (7954) : 773 - 773
[35] The Potential and Pitfalls of GPT-4 in Radiologic Assessment
Arachchige, Arosh S. Perera Molligoda
ACADEMIC RADIOLOGY, 2024, 31 (08) : 3446 - 3447
[36] GPT-4 is here: what scientists think
Katharine Sanderson
Nature, 2023, 615 : 773 - 773
[37] GPT-4 wins chatbot lawyer contest
Hsu, Jeremy
NEW SCIENTIST, 2023, 246 (3456) : 16 - 16
[38] Uncovering the semantics of concepts using GPT-4
Le Mens, Gael
Kovacs, Balazs
Hannan, Michael T.
Pros, Guillem
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2023, 120 (49)
[39] GPT-4, artificial intelligence and implications for publishing
Ong, C. W. M.
Blackbourn, H. D.
Migiliori, G. B.
INTERNATIONAL JOURNAL OF TUBERCULOSIS AND LUNG DISEASE, 2023, 27 (06) : 425 - 426
[40] From GPT-4 to GPT-4o: Progress and Challenges in ECG Interpretation
Pandya, Vidish
Ge, Alan
Ramineni, Shreya
Danilov, Alexandrina
Kirdar, Faisal
Di Biase, Luigi
Ferrick, Kevin
Krumerman, Andrew
CIRCULATION, 2024, 150

← 1 2 3 4 5 →