Automated Program Repair in the Era of Large Pre-trained Language Models

被引：106

作者：

Xia, Chunqiu Steven ^{[1
]}

Wei, Yuxiang ^{[1
]}

Zhang, Lingming ^{[1
]}

机构：

[1] Univ Illinois, Champaign, IL 61820 USA

来源：

2023 IEEE/ACM 45TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ICSE | 2023年

关键词：

CODE;

D O I：

10.1109/ICSE48619.2023.00129

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Automated Program Repair (APR) aims to help developers automatically patch software bugs. However, current state-of-the-art traditional and learning-based APR techniques face the problem of limited patch variety, failing to fix complicated bugs. This is mainly due to the reliance on bug-fixing datasets to craft fix templates (traditional) or directly predict potential patches (learning-based). Large Pre-Trained Language Models (LLMs), trained using billions of text/code tokens, can potentially help avoid this issue. Very recently, researchers have directly leveraged LLMs for APR without relying on any bugfixing datasets. Meanwhile, such existing work either failed to include state-of-the-art LLMs or was not evaluated on realistic datasets. Thus, the true power of modern LLMs on the important APR problem is yet to be revealed. In this work, we perform the first extensive study on directly applying LLMs for APR. We select 9 recent state-of-the-art LLMs, including both generative and infilling models, ranging from 125M to 20B in size. We designed 3 different repair settings to evaluate the different ways we can use LLMs to generate patches: 1) generate the entire patch function, 2) fill in a chunk of code given the prefix and suffix 3) output a single line fix. We apply the LLMs under these repair settings on 5 datasets across 3 different languages and compare different LLMs in the number of bugs fixed, generation speed and compilation rate. We also compare the LLMs against recent state-of-the-art APR tools. Our study demonstrates that directly applying state-ofthe-art LLMs can already substantially outperform all existing APR techniques on all our datasets. Among the studied LLMs, the scaling effect exists for APR where larger models tend to achieve better performance. Also, we show for the first time that suffix code after the buggy line (adopted in infilling-style APR) is important in not only generating more fixes but more patches with higher compilation rate. Besides patch generation, the LLMs consider correct patches to be more natural than other ones, and can even be leveraged for effective patch ranking or patch correctness checking. Lastly, we show that LLM-based APR can be further substantially boosted via: 1) increasing the sample size, and 2) incorporating fix template information.

引用

页码：1482 / 1494

页数：13

共 50 条

[1] Exploring the Potential of Pre-Trained Language Models of Code for Automated Program Repair
Hao, Sichong
Shi, Xianjun
Liu, Hongwei
ELECTRONICS, 2024, 13 (07)
[2] Automated LOINC Standardization Using Pre-trained Large Language Models
Tu, Tao
Loreaux, Eric
Chesley, Emma
Lelkes, Adam D.
Gamble, Paul
Bellaiche, Mathias
Seneviratne, Martin
Chen, Ming-Jun
MACHINE LEARNING FOR HEALTH, VOL 193, 2022, 193 : 343 - 355
[3] Machine Unlearning of Pre-trained Large Language Models
Yao, Jin
Chien, Eli
Du, Minxin
Niu, Xinyao
Wang, Tianhao
Cheng, Zezhou
Yue, Xiang
PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 8403 - 8419
[4] Automated Assessment of Inferences Using Pre-Trained Language Models
Yoo, Yongseok
APPLIED SCIENCES-BASEL, 2024, 14 (09):
[5] Probing Toxic Content in Large Pre-Trained Language Models
Ousidhoum, Nedjma
Zhao, Xinran
Fang, Tianqing
Song, Yangqiu
Yeung, Dit-Yan
59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 4262 - 4274
[6] Enhancing Domain Modeling with Pre-trained Large Language Models: An Automated Assistant for Domain Modelers
Prokop, Dominik
Stenchlak, Stepan
Skoda, Petr
Klimek, Jakub
Necasky, Martin
CONCEPTUAL MODELING, ER 2024, 2025, 15238 : 235 - 253
[7] Pre-Trained Language Models and Their Applications
Wang, Haifeng
Li, Jiwei
Wu, Hua
Hovy, Eduard
Sun, Yu
ENGINEERING, 2023, 25 : 51 - 65
[8] Large Language Models for Automated Program Repair
Ribeiro, Francisco
COMPANION PROCEEDINGS OF THE 2023 ACM SIGPLAN INTERNATIONAL CONFERENCE ON SYSTEMS, PROGRAMMING, LANGUAGES, AND APPLICATIONS: SOFTWARE FOR HUMANITY, SPLASH COMPANION 2023, 2023, : 7 - 9
[9] Large Language Models for Automated Program Repair
Ribeiro, Francisco
SPLASH Companion 2023 - Companion Proceedings of the 2023 ACM SIGPLAN International Conference on Systems, Programming, Languages, and Applications: Software for Humanity, 2023, : 7 - 9
[10] SMT Solver Validation Empowered by Large Pre-trained Language Models
Sun, Maolin
Yang, Yibiao
Wang, Yang
Wen, Ming
Jia, Haoxiang
Zhou, Yuming
2023 38TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, ASE, 2023, : 1288 - 1300

← 1 2 3 4 5 →