Evaluating Explanations for Software Patches Generated by Large Language Models

被引：1

作者：

Sobania, Dominik ^{[1
]}

Geiger, Alina ^{[1
]}

Callan, James ^{[2
]}

Brownlee, Alexander ^{[3
]}

Hanna, Carol ^{[2
]}

Moussa, Rebecca ^{[2
]}

Lopez, Mar Zamorano ^{[2
]}

Petke, Justyna ^{[2
]}

Sarro, Federica ^{[2
]}

机构：

[1] Johannes Gutenberg Univ Mainz, Mainz, Germany

[2] UCL, London, England

[3] Univ Stirling, Stirling, Scotland

来源：

SEARCH-BASED SOFTWARE ENGINEERING, SSBSE 2023 | 2024年 / 14415卷

基金：

英国工程与自然科学研究理事会;

关键词：

Large Language Models; Software Patches; AI Explainability; Program Repair; Genetic Improvement;

D O I：

10.1007/978-3-031-48796-5_12

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Large language models (LLMs) have recently been integrated in a variety of applications including software engineering tasks. In this work, we study the use of LLMs to enhance the explainability of software patches. In particular, we evaluate the performance of GPT 3.5 in explaining patches generated by the search-based automated program repair system ARJA-e for 30 bugs from the popular Defects4J benchmark. We also investigate the performance achieved when explaining the corresponding patches written by software developers. We find that on average 84% of the LLM explanations for machine-generated patches were correct and 54% were complete for the studied categories in at least 1 out of 3 runs. Furthermore, we find that the LLM generates more accurate explanations for machine-generated patches than for human-written ones.

引用

页码：147 / 152

页数：6

共 50 条

[1] Evaluating large language models for software testing
Li, Yihao
Liu, Pan
Wang, Haiyang
Chu, Jie
Wong, W. Eric
COMPUTER STANDARDS & INTERFACES, 2025, 93
[2] Evaluating the Reliability of Self-explanations in Large Language Models
Randl, Korbinian
Pavlopoulos, John
Henriksson, Aron
Lindgren, Tony
DISCOVERY SCIENCE, DS 2024, PT I, 2025, 15243 : 36 - 51
[3] Experiences from Using Code Explanations Generated by Large Language Models in a Web Software Development E-Book
MacNeil, Stephen
Tran, Andrew
Hellas, Arto
Kim, Joanne
Sarsa, Sami
Denny, Paul
Bernstein, Seth
Leinonen, Juho
PROCEEDINGS OF THE 54TH ACM TECHNICAL SYMPOSIUM ON COMPUTER SCIENCE EDUCATION, VOL 1, SIGCSE 2023, 2023, : 931 - 937
[4] Large Language Models as Evaluators for Recommendation Explanations
Zhang, Xiaoyu
Li, Yishan
Wang, Jiayin
Sun, Bowen
Ma, Weizhi
Sun, Peijie
Zhang, Min
PROCEEDINGS OF THE EIGHTEENTH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2024, 2024, : 33 - 42
[5] Quantifying Uncertainty in Natural Language Explanations of Large Language Models
Tanneru, Sree Harsha
Agarwal, Chirag
Lakkaraju, Himabindu
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
[6] A case study of fairness in generated images of Large Language Models for Software Engineering tasks
Sami, Mansour
Sami, Ashkan
Barclay, Pete
2023 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION, ICSME, 2023, : 391 - 396
[7] Are self-explanations from Large Language Models faithful?
Madsen, Andreas
Chandar, Sarath
Reddy, Siva
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 295 - 337
[8] Comparing Code Explanations Created by Students and Large Language Models
Leinonen, Juho
Denny, Paul
MacNeil, Stephen
Sarsa, Sami
Bernstein, Seth
Kim, Joanne
Tran, Andrew
Hellas, Arto
PROCEEDINGS OF THE 2023 CONFERENCE ON INNOVATION AND TECHNOLOGY IN COMPUTER SCIENCE EDUCATION, ITICSE 2023, VOL 1, 2023, : 124 - 130
[9] Increased Software Security with Large Language Models
Sagodi, Zoltan
Hegedus, Peter
Ferenc, Rudolf
ERCIM NEWS, 2024, (139):
[10] Software Modeling Assistance with Large Language Models
Ben Chaaben, Meriem
ACM/IEEE 27TH INTERNATIONAL CONFERENCE ON MODEL DRIVEN ENGINEERING LANGUAGES AND SYSTEMS: COMPANION PROCEEDINGS, MODELS 2024, 2024, : 188 - 191

← 1 2 3 4 5 →