Evaluating Explanations for Software Patches Generated by Large Language Models

被引：0

作者：

Sobania, Dominik ^{[1
]}

Geiger, Alina ^{[1
]}

Callan, James ^{[2
]}

Brownlee, Alexander ^{[3
]}

Hanna, Carol ^{[2
]}

Moussa, Rebecca ^{[2
]}

Lopez, Mar Zamorano ^{[2
]}

Petke, Justyna ^{[2
]}

Sarro, Federica ^{[2
]}

机构：

[1] Johannes Gutenberg Univ Mainz, Mainz, Germany

[2] UCL, London, England

[3] Univ Stirling, Stirling, Scotland

来源：

SEARCH-BASED SOFTWARE ENGINEERING, SSBSE 2023 | 2024年 / 14415卷

基金：

英国工程与自然科学研究理事会;

关键词：

Large Language Models; Software Patches; AI Explainability; Program Repair; Genetic Improvement;

D O I：

10.1007/978-3-031-48796-5_12

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Large language models (LLMs) have recently been integrated in a variety of applications including software engineering tasks. In this work, we study the use of LLMs to enhance the explainability of software patches. In particular, we evaluate the performance of GPT 3.5 in explaining patches generated by the search-based automated program repair system ARJA-e for 30 bugs from the popular Defects4J benchmark. We also investigate the performance achieved when explaining the corresponding patches written by software developers. We find that on average 84% of the LLM explanations for machine-generated patches were correct and 54% were complete for the studied categories in at least 1 out of 3 runs. Furthermore, we find that the LLM generates more accurate explanations for machine-generated patches than for human-written ones.

引用

页码：147 / 152

页数：6

共 50 条

[41] Evaluating the Ability of Large Language Models to Generate Motivational Feedback
Gaeta, Angelo
Orciuoli, Francesco
Pascuzzo, Antonella
Peduto, Angela
[J]. GENERATIVE INTELLIGENCE AND INTELLIGENT TUTORING SYSTEMS, PT I, ITS 2024, 2024, 14798 : 188 - 201
[42] Evaluating the effectiveness of large language models in patient education for conjunctivitis
Wang, Jingyuan
Shi, Runhan
Le, Qihua
Shan, Kun
Chen, Zhi
Zhou, Xujiao
He, Yao
Hong, Jiaxu
[J]. BRITISH JOURNAL OF OPHTHALMOLOGY, 2024,
[43] Evaluating Cognitive Maps and planning in Large Language Models with CogEval
Momennejad, Ida
Hasanbeig, Hosein
Frujeri, Felipe Vieira
Sharma, Hiteshi
Ness, Robert Osazuwa
Jojic, Nebojsa
Palangi, Hamid
Larson, Jonathan
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[44] Evaluating the Efficacy of Large Language Models in Identifying Phishing Attempts
Patel, Het
Reiman, Umair
Iqbal, Farkhund
[J]. 2024 16TH INTERNATIONAL CONFERENCE ON HUMAN SYSTEM INTERACTION, HSI 2024, 2024,
[45] Limits of Detecting Text Generated by Large-Scale Language Models
Varshney, Lav R.
Keskar, Nitish Shirish
Socher, Richard
[J]. 2020 INFORMATION THEORY AND APPLICATIONS WORKSHOP (ITA), 2020,
[46] VARIABILITY AND IMPROVEMENTS OF ANSWERS GENERATED WITH DIFFERENT VERSIONS OF LARGE LANGUAGE MODELS
Benbow, E.
Reason, T.
Malcolm, B.
Klijn, S.
Hill, N.
Teitsson, S.
[J]. VALUE IN HEALTH, 2024, 27 (06) : S272 - S272
[47] Enhancing Network Management Using Code Generated by Large Language Models
Mani, Sathiya Kumaran
Zhou, Yajie
Hsieh, Kevin
Segarra, Santiago
Eberl, Trevor
Azulai, Eliran
Frizler, Ido
Chandra, Ranveer
Kandula, Srikanth
[J]. PROCEEDINGS OF THE 22ND ACM WORKSHOP ON HOT TOPICS IN NETWORKS, HOTNETS 2023, 2023, : 196 - 204
[48] Evaluating the Language Abilities of Large Language Models vs. Humans: Three Caveats
Leivada, Evelina
Dentella, Vittoria
Guenther, Fritz
[J]. BIOLINGUISTICS, 2024, 18
[49] MedBench: A Large-Scale Chinese Benchmark for Evaluating Medical Large Language Models
Cai, Yan
Wang, Linlin
Wang, Ye
de Melo, Gerard
Zhang, Ya
Wang, Yanfeng
He, Liang
[J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 17709 - 17717
[50] Mapping Source Code to Software Architecture by Leveraging Large Language Models
Johansson, Nils
Caporuscio, Mauro
Olsson, Tobias
[J]. SOFTWARE ARCHITECTURE, ECSA 2024 TRACKS AND WORKSHOPS, 2024, 14937 : 133 - 149

← 1 2 3 4 5 →