Automated Program Repair in the Era of Large Pre-trained Language Models

被引：106

作者：

Xia, Chunqiu Steven ^{[1
]}

Wei, Yuxiang ^{[1
]}

Zhang, Lingming ^{[1
]}

机构：

[1] Univ Illinois, Champaign, IL 61820 USA

来源：

2023 IEEE/ACM 45TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ICSE | 2023年

关键词：

CODE;

D O I：

10.1109/ICSE48619.2023.00129

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Automated Program Repair (APR) aims to help developers automatically patch software bugs. However, current state-of-the-art traditional and learning-based APR techniques face the problem of limited patch variety, failing to fix complicated bugs. This is mainly due to the reliance on bug-fixing datasets to craft fix templates (traditional) or directly predict potential patches (learning-based). Large Pre-Trained Language Models (LLMs), trained using billions of text/code tokens, can potentially help avoid this issue. Very recently, researchers have directly leveraged LLMs for APR without relying on any bugfixing datasets. Meanwhile, such existing work either failed to include state-of-the-art LLMs or was not evaluated on realistic datasets. Thus, the true power of modern LLMs on the important APR problem is yet to be revealed. In this work, we perform the first extensive study on directly applying LLMs for APR. We select 9 recent state-of-the-art LLMs, including both generative and infilling models, ranging from 125M to 20B in size. We designed 3 different repair settings to evaluate the different ways we can use LLMs to generate patches: 1) generate the entire patch function, 2) fill in a chunk of code given the prefix and suffix 3) output a single line fix. We apply the LLMs under these repair settings on 5 datasets across 3 different languages and compare different LLMs in the number of bugs fixed, generation speed and compilation rate. We also compare the LLMs against recent state-of-the-art APR tools. Our study demonstrates that directly applying state-ofthe-art LLMs can already substantially outperform all existing APR techniques on all our datasets. Among the studied LLMs, the scaling effect exists for APR where larger models tend to achieve better performance. Also, we show for the first time that suffix code after the buggy line (adopted in infilling-style APR) is important in not only generating more fixes but more patches with higher compilation rate. Besides patch generation, the LLMs consider correct patches to be more natural than other ones, and can even be leveraged for effective patch ranking or patch correctness checking. Lastly, we show that LLM-based APR can be further substantially boosted via: 1) increasing the sample size, and 2) incorporating fix template information.

引用

页码：1482 / 1494

页数：13

共 50 条

[21] Probing for Hyperbole in Pre-Trained Language Models
Schneidermann, Nina Skovgaard
Hershcovich, Daniel
Pedersen, Bolette Sandford
PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-SRW 2023, VOL 4, 2023, : 200 - 211
[22] Pre-trained language models in medicine: A survey *
Luo, Xudong
Deng, Zhiqi
Yang, Binxia
Luo, Michael Y.
ARTIFICIAL INTELLIGENCE IN MEDICINE, 2024, 154
[23] Adopting Pre-trained Large Language Models for Regional Language Tasks: A Case Study
Gaikwad, Harsha
Kiwelekar, Arvind
Laddha, Manjushree
Shahare, Shashank
INTELLIGENT HUMAN COMPUTER INTERACTION, IHCI 2023, PT I, 2024, 14531 : 15 - 25
[24] Synergizing Large Language Models and Pre-Trained Smaller Models for Conversational Intent Discovery
Liang, Jinggui
Liao, Lizi
Fei, Hao
Jiang, Jing
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 14133 - 14147
[25] Clinical efficacy of pre-trained large language models through the lens of aphasia
Cong, Yan
Lacroix, Arianna N.
Lee, Jiyeon
SCIENTIFIC REPORTS, 2024, 14 (01):
[26] Editorial for Special Issue on Pre-trained Large Language Models for Information Processing
Wang, Bin
Kawahara, Tatsuya
Li, Haizhou
Meng, Helen
Wu, Chung-Hsien
APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2024, 13 (02)
[27] Grounding Ontologies with Pre-Trained Large Language Models for Activity Based Intelligence
Azim, Anee
Clark, Leon
Lau, Caleb
Cobb, Miles
Jenner, Kendall
SIGNAL PROCESSING, SENSOR/INFORMATION FUSION, AND TARGET RECOGNITION XXXIII, 2024, 13057
[28] The Use and Misuse of Pre-Trained Generative Large Language Models in Reliability Engineering
Hu, Yunwei
Goktas, Yavuz
Yellamati, David Deepak
De Tassigny, Catherine
2024 ANNUAL RELIABILITY AND MAINTAINABILITY SYMPOSIUM, RAMS, 2024,
[29] A Study of Pre-trained Language Models in Natural Language Processing
Duan, Jiajia
Zhao, Hui
Zhou, Qian
Qiu, Meikang
Liu, Meiqin
2020 IEEE INTERNATIONAL CONFERENCE ON SMART CLOUD (SMARTCLOUD 2020), 2020, : 116 - 121
[30] Recent Advances in Natural Language Processing via Large Pre-trained Language Models: A Survey
Min, Bonan
Ross, Hayley
Sulem, Elior
Ben Veyseh, Amir Pouran
Nguyen, Thien Huu
Sainz, Oscar
Agirre, Eneko
Heintz, Ilana
Roth, Dan
ACM COMPUTING SURVEYS, 2024, 56 (02)

← 1 2 3 4 5 →