Automated Program Repair in the Era of Large Pre-trained Language Models

被引:106
|
作者
Xia, Chunqiu Steven [1 ]
Wei, Yuxiang [1 ]
Zhang, Lingming [1 ]
机构
[1] Univ Illinois, Champaign, IL 61820 USA
关键词
CODE;
D O I
10.1109/ICSE48619.2023.00129
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Automated Program Repair (APR) aims to help developers automatically patch software bugs. However, current state-of-the-art traditional and learning-based APR techniques face the problem of limited patch variety, failing to fix complicated bugs. This is mainly due to the reliance on bug-fixing datasets to craft fix templates (traditional) or directly predict potential patches (learning-based). Large Pre-Trained Language Models (LLMs), trained using billions of text/code tokens, can potentially help avoid this issue. Very recently, researchers have directly leveraged LLMs for APR without relying on any bugfixing datasets. Meanwhile, such existing work either failed to include state-of-the-art LLMs or was not evaluated on realistic datasets. Thus, the true power of modern LLMs on the important APR problem is yet to be revealed. In this work, we perform the first extensive study on directly applying LLMs for APR. We select 9 recent state-of-the-art LLMs, including both generative and infilling models, ranging from 125M to 20B in size. We designed 3 different repair settings to evaluate the different ways we can use LLMs to generate patches: 1) generate the entire patch function, 2) fill in a chunk of code given the prefix and suffix 3) output a single line fix. We apply the LLMs under these repair settings on 5 datasets across 3 different languages and compare different LLMs in the number of bugs fixed, generation speed and compilation rate. We also compare the LLMs against recent state-of-the-art APR tools. Our study demonstrates that directly applying state-ofthe-art LLMs can already substantially outperform all existing APR techniques on all our datasets. Among the studied LLMs, the scaling effect exists for APR where larger models tend to achieve better performance. Also, we show for the first time that suffix code after the buggy line (adopted in infilling-style APR) is important in not only generating more fixes but more patches with higher compilation rate. Besides patch generation, the LLMs consider correct patches to be more natural than other ones, and can even be leveraged for effective patch ranking or patch correctness checking. Lastly, we show that LLM-based APR can be further substantially boosted via: 1) increasing the sample size, and 2) incorporating fix template information.
引用
收藏
页码:1482 / 1494
页数:13
相关论文
共 50 条
  • [41] InA: Inhibition Adaption on pre-trained language models
    Kang, Cheng
    Prokop, Jindrich
    Tong, Lei
    Zhou, Huiyu
    Hu, Yong
    Novak, Daniel
    NEURAL NETWORKS, 2024, 178
  • [42] Leveraging Pre-trained Language Models for Gender Debiasing
    Jain, Nishtha
    Popovic, Maja
    Groves, Declan
    Specia, Lucia
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 2188 - 2195
  • [43] A Survey of Knowledge Enhanced Pre-Trained Language Models
    Hu, Linmei
    Liu, Zeyi
    Zhao, Ziwang
    Hou, Lei
    Nie, Liqiang
    Li, Juanzi
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (04) : 1413 - 1430
  • [44] Exploring Robust Overfitting for Pre-trained Language Models
    Zhu, Bin
    Rao, Yanghui
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 5506 - 5522
  • [45] Self-conditioning Pre-Trained Language Models
    Suau, Xavier
    Zappella, Luca
    Apostoloff, Nicholas
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [46] Commonsense Knowledge Transfer for Pre-trained Language Models
    Zhou, Wangchunshu
    Le Bras, Ronan
    Choi, Yejin
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 5946 - 5960
  • [47] Pre-trained models for natural language processing: A survey
    QIU XiPeng
    SUN TianXiang
    XU YiGe
    SHAO YunFan
    DAI Ning
    HUANG XuanJing
    Science China(Technological Sciences), 2020, 63 (10) : 1872 - 1897
  • [48] Evaluating the Summarization Comprehension of Pre-Trained Language Models
    Chernyshev, D. I.
    Dobrov, B. V.
    LOBACHEVSKII JOURNAL OF MATHEMATICS, 2023, 44 (08) : 3028 - 3039
  • [49] Pre-trained language models: What do they know?
    Guimaraes, Nuno
    Campos, Ricardo
    Jorge, Alipio
    WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2024, 14 (01)
  • [50] Capturing Semantics for Imputation with Pre-trained Language Models
    Mei, Yinan
    Song, Shaoxu
    Fang, Chenguang
    Yang, Haifeng
    Fang, Jingyun
    Long, Jiang
    2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021), 2021, : 61 - 72