Impact of Code Language Models on Automated Program Repair

被引:36
|
作者
Jiang, Nan [1 ]
Liu, Kevin [2 ]
Lutellier, Thibaud [3 ]
Tan, Lin [1 ]
机构
[1] Purdue Univ, W Lafayette, IN 47907 USA
[2] Lynbrook High Sch, San Jose, CA USA
[3] Univ Alberta, Edmonton, AB, Canada
关键词
Automated Program Repair; Code Language Model; Fine-Tuning; Deep Learning;
D O I
10.1109/ICSE48619.2023.00125
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Automated program repair (APR) aims to help developers improve software reliability by generating patches for buggy programs. Although many code language models (CLM) are developed and effective in many software tasks such as code completion, there has been little comprehensive, in-depth work to evaluate CLMs' fixing capabilities and to fine-tune CLMs for the APR task. Firstly, this work is the first to evaluate ten CLMs on four APR benchmarks, which shows that surprisingly, the best CLM, as is, fixes 72% more bugs than the state-of-the-art deep-learning (DL)-based APR techniques. Secondly, one of the four APR benchmarks was created by us in this paper to avoid data leaking for a fair evaluation. Thirdly, it is the first work to fine-tune CLMs with APR training data, which shows that finetuning brings 31%-1,267% improvement to CLMs and enables them to fix 46%-164% more bugs than existing DL-based APR techniques. Fourthly, this work studies the impact of buggy lines, showing that CLMs, as is, cannot make good use of the buggy lines to fix bugs, yet fine-tuned CLMs could potentially over-rely on buggy lines. Lastly, this work analyzes the size, time, and memory efficiency of different CLMs. This work shows promising directions for the APR domain, such as fine-tuning CLMs with APR-specific designs, and also raises awareness of fair and comprehensive evaluations of CLMs and calls for more transparent reporting of open-source repositories used in the pre-training data to address the data leaking problem.
引用
收藏
页码:1430 / 1442
页数:13
相关论文
共 50 条
  • [31] Automated Program Repair
    Le Goues, Claire
    Pradel, Michael
    Roychoudhury, Abhik
    COMMUNICATIONS OF THE ACM, 2019, 62 (12) : 56 - 65
  • [32] Automated C/C plus plus Program Repair for High-Level Synthesis via Large Language Models
    Xu, Kangwei
    Zhang, Grace Li
    Yin, Xunzhao
    Zhuo, Cheng
    Schlichmann, Ulf
    Li, Bing
    PROCEEDINGS OF THE 2024 ACM/IEEE INTERNATIONAL SYMPOSIUM ON MACHINE LEARNING FOR CAD, MLCAD 2024, 2024,
  • [33] Towards Minimal Edits in Automated Program Repair: A Hybrid Framework Integrating Graph Neural Networks and Large Language Models
    Xu, Zhenyu
    Sheng, Victor S.
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT V, 2024, 15020 : 402 - 416
  • [34] Training Language Models for Programming Feedback Using Automated Repair Tools
    Koutcheme, Charles
    ARTIFICIAL INTELLIGENCE IN EDUCATION, AIED 2023, 2023, 13916 : 830 - 835
  • [35] On the acceptance by code reviewers of candidate security patches suggested by Automated Program Repair tools
    Papotti, Aurora
    Paramitha, Ranindya
    Massacci, Fabio
    EMPIRICAL SOFTWARE ENGINEERING, 2024, 29 (05)
  • [36] HeteroGen: Transpiling C to Heterogeneous HLS Code with Automated Test Generation and Program Repair
    Zhang, Qian
    Wang, Jiyuan
    Xu, Guoqing Harry
    Kim, Miryung
    ASPLOS '22: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS, 2022, : 1017 - 1029
  • [37] Evaluating Large Language Models for Automated CPT Code Prediction in Endovascular Neurosurgery
    Roy, Joanna M.
    Self, D. Mitchell
    Isch, Emily
    Musmar, Basel
    Lan, Matthews
    Keppetipola, Kavantissa
    Koduri, Sravanthi
    Pontarelli, Mary-Katharine
    Tjoumakaris, Stavropoula I.
    Gooch, M. Reid
    Rosenwasser, Robert H.
    Jabbour, Pascal M.
    JOURNAL OF MEDICAL SYSTEMS, 2025, 49 (01)
  • [38] ENSESMELLS : Deep ensemble and programming language models for automated code smells detection
    Ho, Anh
    Bui, Anh M. T.
    Nguyen, Phuong T.
    Di Salle, Amleto
    Le, Bach
    JOURNAL OF SYSTEMS AND SOFTWARE, 2025, 224
  • [39] Investigating large language models capabilities for automatic code repair in Python']Python
    Omari, Safwan
    Basnet, Kshitiz
    Wardat, Mohammad
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2024, 27 (08): : 10717 - 10731
  • [40] Automated Program Refinement: Guide and Verify Code Large Language Model with Refinement Calculus
    Cai, Yufan
    Hou, Zhe
    Sanan, David
    Luan, Xiaokun
    Lin, Yun
    Sun, Jun
    Dong, Jin Song
    PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL, 2025, 9 (POPI):