Evaluating Explanations for Software Patches Generated by Large Language Models

被引:0
|
作者
Sobania, Dominik [1 ]
Geiger, Alina [1 ]
Callan, James [2 ]
Brownlee, Alexander [3 ]
Hanna, Carol [2 ]
Moussa, Rebecca [2 ]
Lopez, Mar Zamorano [2 ]
Petke, Justyna [2 ]
Sarro, Federica [2 ]
机构
[1] Johannes Gutenberg Univ Mainz, Mainz, Germany
[2] UCL, London, England
[3] Univ Stirling, Stirling, Scotland
基金
英国工程与自然科学研究理事会;
关键词
Large Language Models; Software Patches; AI Explainability; Program Repair; Genetic Improvement;
D O I
10.1007/978-3-031-48796-5_12
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Large language models (LLMs) have recently been integrated in a variety of applications including software engineering tasks. In this work, we study the use of LLMs to enhance the explainability of software patches. In particular, we evaluate the performance of GPT 3.5 in explaining patches generated by the search-based automated program repair system ARJA-e for 30 bugs from the popular Defects4J benchmark. We also investigate the performance achieved when explaining the corresponding patches written by software developers. We find that on average 84% of the LLM explanations for machine-generated patches were correct and 54% were complete for the studied categories in at least 1 out of 3 runs. Furthermore, we find that the LLM generates more accurate explanations for machine-generated patches than for human-written ones.
引用
收藏
页码:147 / 152
页数:6
相关论文
共 50 条
  • [1] Experiences from Using Code Explanations Generated by Large Language Models in a Web Software Development E-Book
    MacNeil, Stephen
    Tran, Andrew
    Hellas, Arto
    Kim, Joanne
    Sarsa, Sami
    Denny, Paul
    Bernstein, Seth
    Leinonen, Juho
    [J]. PROCEEDINGS OF THE 54TH ACM TECHNICAL SYMPOSIUM ON COMPUTER SCIENCE EDUCATION, VOL 1, SIGCSE 2023, 2023, : 931 - 937
  • [2] Quantifying Uncertainty in Natural Language Explanations of Large Language Models
    Tanneru, Sree Harsha
    Agarwal, Chirag
    Lakkaraju, Himabindu
    [J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [3] A case study of fairness in generated images of Large Language Models for Software Engineering tasks
    Sami, Mansour
    Sami, Ashkan
    Barclay, Pete
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION, ICSME, 2023, : 391 - 396
  • [4] Comparing Code Explanations Created by Students and Large Language Models
    Leinonen, Juho
    Denny, Paul
    MacNeil, Stephen
    Sarsa, Sami
    Bernstein, Seth
    Kim, Joanne
    Tran, Andrew
    Hellas, Arto
    [J]. PROCEEDINGS OF THE 2023 CONFERENCE ON INNOVATION AND TECHNOLOGY IN COMPUTER SCIENCE EDUCATION, ITICSE 2023, VOL 1, 2023, : 124 - 130
  • [5] Evaluating large language models for annotating proteins
    Vitale, Rosario
    Bugnon, Leandro A.
    Fenoy, Emilio Luis
    Milone, Diego H.
    Stegmayer, Georgina
    [J]. BRIEFINGS IN BIOINFORMATICS, 2024, 25 (03)
  • [6] A bilingual benchmark for evaluating large language models
    Alkaoud, Mohamed
    [J]. PEERJ COMPUTER SCIENCE, 2024, 10
  • [7] Evaluating large language models as agents in the clinic
    Nikita Mehandru
    Brenda Y. Miao
    Eduardo Rodriguez Almaraz
    Madhumita Sushil
    Atul J. Butte
    Ahmed Alaa
    [J]. npj Digital Medicine, 7
  • [8] Evaluating large language models as agents in the clinic
    Mehandru, Nikita
    Miao, Brenda Y.
    Almaraz, Eduardo Rodriguez
    Sushil, Madhumita
    Butte, Atul J.
    Alaa, Ahmed
    [J]. NPJ DIGITAL MEDICINE, 2024, 7 (01)
  • [9] Evaluating Intelligence and Knowledge in Large Language Models
    Bianchini, Francesco
    [J]. TOPOI-AN INTERNATIONAL REVIEW OF PHILOSOPHY, 2024,
  • [10] Preventing and Detecting Misinformation Generated by Large Language Models
    Liu, Aiwei
    Sheng, Qiang
    Hu, Xuming
    [J]. PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 3001 - 3004