Automated Program Repair in the Era of Large Pre-trained Language Models

被引:106
|
作者
Xia, Chunqiu Steven [1 ]
Wei, Yuxiang [1 ]
Zhang, Lingming [1 ]
机构
[1] Univ Illinois, Champaign, IL 61820 USA
关键词
CODE;
D O I
10.1109/ICSE48619.2023.00129
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Automated Program Repair (APR) aims to help developers automatically patch software bugs. However, current state-of-the-art traditional and learning-based APR techniques face the problem of limited patch variety, failing to fix complicated bugs. This is mainly due to the reliance on bug-fixing datasets to craft fix templates (traditional) or directly predict potential patches (learning-based). Large Pre-Trained Language Models (LLMs), trained using billions of text/code tokens, can potentially help avoid this issue. Very recently, researchers have directly leveraged LLMs for APR without relying on any bugfixing datasets. Meanwhile, such existing work either failed to include state-of-the-art LLMs or was not evaluated on realistic datasets. Thus, the true power of modern LLMs on the important APR problem is yet to be revealed. In this work, we perform the first extensive study on directly applying LLMs for APR. We select 9 recent state-of-the-art LLMs, including both generative and infilling models, ranging from 125M to 20B in size. We designed 3 different repair settings to evaluate the different ways we can use LLMs to generate patches: 1) generate the entire patch function, 2) fill in a chunk of code given the prefix and suffix 3) output a single line fix. We apply the LLMs under these repair settings on 5 datasets across 3 different languages and compare different LLMs in the number of bugs fixed, generation speed and compilation rate. We also compare the LLMs against recent state-of-the-art APR tools. Our study demonstrates that directly applying state-ofthe-art LLMs can already substantially outperform all existing APR techniques on all our datasets. Among the studied LLMs, the scaling effect exists for APR where larger models tend to achieve better performance. Also, we show for the first time that suffix code after the buggy line (adopted in infilling-style APR) is important in not only generating more fixes but more patches with higher compilation rate. Besides patch generation, the LLMs consider correct patches to be more natural than other ones, and can even be leveraged for effective patch ranking or patch correctness checking. Lastly, we show that LLM-based APR can be further substantially boosted via: 1) increasing the sample size, and 2) incorporating fix template information.
引用
收藏
页码:1482 / 1494
页数:13
相关论文
共 50 条
  • [21] Probing for Hyperbole in Pre-Trained Language Models
    Schneidermann, Nina Skovgaard
    Hershcovich, Daniel
    Pedersen, Bolette Sandford
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-SRW 2023, VOL 4, 2023, : 200 - 211
  • [22] Pre-trained language models in medicine: A survey *
    Luo, Xudong
    Deng, Zhiqi
    Yang, Binxia
    Luo, Michael Y.
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2024, 154
  • [23] Adopting Pre-trained Large Language Models for Regional Language Tasks: A Case Study
    Gaikwad, Harsha
    Kiwelekar, Arvind
    Laddha, Manjushree
    Shahare, Shashank
    INTELLIGENT HUMAN COMPUTER INTERACTION, IHCI 2023, PT I, 2024, 14531 : 15 - 25
  • [24] Synergizing Large Language Models and Pre-Trained Smaller Models for Conversational Intent Discovery
    Liang, Jinggui
    Liao, Lizi
    Fei, Hao
    Jiang, Jing
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 14133 - 14147
  • [25] Clinical efficacy of pre-trained large language models through the lens of aphasia
    Cong, Yan
    Lacroix, Arianna N.
    Lee, Jiyeon
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [26] Editorial for Special Issue on Pre-trained Large Language Models for Information Processing
    Wang, Bin
    Kawahara, Tatsuya
    Li, Haizhou
    Meng, Helen
    Wu, Chung-Hsien
    APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2024, 13 (02)
  • [27] Grounding Ontologies with Pre-Trained Large Language Models for Activity Based Intelligence
    Azim, Anee
    Clark, Leon
    Lau, Caleb
    Cobb, Miles
    Jenner, Kendall
    SIGNAL PROCESSING, SENSOR/INFORMATION FUSION, AND TARGET RECOGNITION XXXIII, 2024, 13057
  • [28] The Use and Misuse of Pre-Trained Generative Large Language Models in Reliability Engineering
    Hu, Yunwei
    Goktas, Yavuz
    Yellamati, David Deepak
    De Tassigny, Catherine
    2024 ANNUAL RELIABILITY AND MAINTAINABILITY SYMPOSIUM, RAMS, 2024,
  • [29] A Study of Pre-trained Language Models in Natural Language Processing
    Duan, Jiajia
    Zhao, Hui
    Zhou, Qian
    Qiu, Meikang
    Liu, Meiqin
    2020 IEEE INTERNATIONAL CONFERENCE ON SMART CLOUD (SMARTCLOUD 2020), 2020, : 116 - 121
  • [30] Recent Advances in Natural Language Processing via Large Pre-trained Language Models: A Survey
    Min, Bonan
    Ross, Hayley
    Sulem, Elior
    Ben Veyseh, Amir Pouran
    Nguyen, Thien Huu
    Sainz, Oscar
    Agirre, Eneko
    Heintz, Ilana
    Roth, Dan
    ACM COMPUTING SURVEYS, 2024, 56 (02)