Feedback-Generation for Programming Exercises With GPT-4

被引:6
|
作者
Azaiz, Imen [1 ]
Kiesler, Natalie [2 ]
Strickroth, Sven [1 ]
机构
[1] Ludwig Maximilians Univ Munchen, Munich, Germany
[2] Nuremberg Tech, Nurnberg, Germany
来源
PROCEEDINGS OF THE 2024 CONFERENCE INNOVATION AND TECHNOLOGY IN COMPUTER SCIENCE EDUCATION, VOL 1, ITICSE 2024 | 2024年
关键词
formative feedback; personalized feedback; assessment; introductory programming; Large Language Models; LLMs; GPT-4; Turbo; benchmarking;
D O I
10.1145/3649217.3653594
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Ever since Large Language Models (LLMs) and related applications have become broadly available, several studies investigated their potential for assisting educators and supporting students in higher education. LLMs such as Codex, GPT-3.5, and GPT 4 have shown promising results in the context of large programming courses, where students can benefit from feedback and hints if provided timely and at scale. This paper explores the quality of GPT-4 Turbo's generated output for prompts containing both the programming task specification and a student's submission as input. Two assignments from an introductory programming course were selected, and GPT-4 was asked to generate feedback for 55 randomly chosen, authentic student programming submissions. The output was qualitatively analyzed regarding correctness, personalization, fault localization, and other features identified in the material. Compared to prior work and analyses of GPT-3.5, GPT-4 Turbo shows notable improvements. For example, the output is more structured and consistent. GPT-4 Turbo can also accurately identify invalid casing in student programs' output. In some cases, the feedback also includes the output of the student program. At the same time, inconsistent feedback was noted such as stating that the submission is correct but an error needs to be fixed. The present work increases our understanding of LLMs' potential, limitations, and how to integrate them into e-assessment systems, pedagogical scenarios, and instructing students who are using applications based on GPT-4.
引用
收藏
页码:31 / 37
页数:7
相关论文
共 50 条
  • [31] Investigating the Perception of the Future in GPT-3,-3.5 and GPT-4
    Kozachek, Diana
    2023 PROCEEDINGS OF THE 15TH CONFERENCE ON CREATIVITY AND COGNITION, C&C 2023, 2023, : 282 - 287
  • [32] Text understanding in GPT-4 versus humans
    Shultz, Thomas R.
    Wise, Jamie M.
    Nobandegani, Ardavan S.
    ROYAL SOCIETY OPEN SCIENCE, 2025, 12 (02):
  • [33] GPT-4: The Future of Cosmetic Procedure Consultation?
    Sun, Yi-Xin
    Li, Zi-Ming
    Huang, Jiu-Zuo
    Yu, Nan-ze
    Long, Xiao
    AESTHETIC SURGERY JOURNAL, 2023, 43 (08) : NP670 - NP672
  • [34] GPT-4 is here: what scientists think
    Sanderson, Katharine
    NATURE, 2023, 615 (7954) : 773 - 773
  • [35] The Potential and Pitfalls of GPT-4 in Radiologic Assessment
    Arachchige, Arosh S. Perera Molligoda
    ACADEMIC RADIOLOGY, 2024, 31 (08) : 3446 - 3447
  • [36] GPT-4 is here: what scientists think
    Katharine Sanderson
    Nature, 2023, 615 : 773 - 773
  • [37] GPT-4 wins chatbot lawyer contest
    Hsu, Jeremy
    NEW SCIENTIST, 2023, 246 (3456) : 16 - 16
  • [38] Uncovering the semantics of concepts using GPT-4
    Le Mens, Gael
    Kovacs, Balazs
    Hannan, Michael T.
    Pros, Guillem
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2023, 120 (49)
  • [39] GPT-4, artificial intelligence and implications for publishing
    Ong, C. W. M.
    Blackbourn, H. D.
    Migiliori, G. B.
    INTERNATIONAL JOURNAL OF TUBERCULOSIS AND LUNG DISEASE, 2023, 27 (06) : 425 - 426
  • [40] From GPT-4 to GPT-4o: Progress and Challenges in ECG Interpretation
    Pandya, Vidish
    Ge, Alan
    Ramineni, Shreya
    Danilov, Alexandrina
    Kirdar, Faisal
    Di Biase, Luigi
    Ferrick, Kevin
    Krumerman, Andrew
    CIRCULATION, 2024, 150