Impact of Code Language Models on Automated Program Repair

被引:36
|
作者
Jiang, Nan [1 ]
Liu, Kevin [2 ]
Lutellier, Thibaud [3 ]
Tan, Lin [1 ]
机构
[1] Purdue Univ, W Lafayette, IN 47907 USA
[2] Lynbrook High Sch, San Jose, CA USA
[3] Univ Alberta, Edmonton, AB, Canada
关键词
Automated Program Repair; Code Language Model; Fine-Tuning; Deep Learning;
D O I
10.1109/ICSE48619.2023.00125
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Automated program repair (APR) aims to help developers improve software reliability by generating patches for buggy programs. Although many code language models (CLM) are developed and effective in many software tasks such as code completion, there has been little comprehensive, in-depth work to evaluate CLMs' fixing capabilities and to fine-tune CLMs for the APR task. Firstly, this work is the first to evaluate ten CLMs on four APR benchmarks, which shows that surprisingly, the best CLM, as is, fixes 72% more bugs than the state-of-the-art deep-learning (DL)-based APR techniques. Secondly, one of the four APR benchmarks was created by us in this paper to avoid data leaking for a fair evaluation. Thirdly, it is the first work to fine-tune CLMs with APR training data, which shows that finetuning brings 31%-1,267% improvement to CLMs and enables them to fix 46%-164% more bugs than existing DL-based APR techniques. Fourthly, this work studies the impact of buggy lines, showing that CLMs, as is, cannot make good use of the buggy lines to fix bugs, yet fine-tuned CLMs could potentially over-rely on buggy lines. Lastly, this work analyzes the size, time, and memory efficiency of different CLMs. This work shows promising directions for the APR domain, such as fine-tuning CLMs with APR-specific designs, and also raises awareness of fair and comprehensive evaluations of CLMs and calls for more transparent reporting of open-source repositories used in the pre-training data to address the data leaking problem.
引用
收藏
页码:1430 / 1442
页数:13
相关论文
共 50 条
  • [41] Investigating the impact of test cases on the performance of automated program repair.
    Matsuda N.
    Maruyama K.
    Computer Software, 2020, 37 (04): : 31 - 37
  • [42] A Method for Automated Program Code Testing
    Drasutis, Sigitas
    Motekaityte, Vida
    Noreika, Algirdas
    INFORMATICS IN EDUCATION, 2010, 9 (02): : 199 - 208
  • [43] Automated Infrastructure as Code Program Testing
    Sokolowski, Daniel
    Spielmann, David
    Salvaneschi, Guido
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2024, 50 (06) : 1585 - 1599
  • [44] Mining software repair models for reasoning on the search space of automated program fixing
    Martinez, Matias
    Monperrus, Martin
    EMPIRICAL SOFTWARE ENGINEERING, 2015, 20 (01) : 176 - 205
  • [45] Automated Program Repair Using Donor Code Generation Based on Features of Targeted Systems.
    Yasuda, Kazuya
    Itoh, Shinji
    Nakamura, Tomonori
    Harada, Masao
    Higo, Yoshiki
    Computer Software, 2021, 38 (04): : 23 - 32
  • [46] Mining software repair models for reasoning on the search space of automated program fixing
    Matias Martinez
    Martin Monperrus
    Empirical Software Engineering, 2015, 20 : 176 - 205
  • [47] Framing Program Repair as Code Completion
    Ribeiro, Francisco
    Abreu, Rui
    Saraiva, Joao
    INTERNATIONAL WORKSHOP ON AUTOMATED PROGRAM REPAIR (APR 2022), 2022, : 38 - 45
  • [48] Structural Language Models of Code
    Alon, Uri
    Sadaka, Roy
    Levy, Omer
    Yahav, Eran
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [49] SpecGen: Automated Generation of Formal Program Specifications via Large Language Models
    Ma, Lezhi
    Liu, Shangqing
    Li, Yi
    Xie, Xiaofei
    Bu, Lei
    arXiv, 1600,
  • [50] Automated Code Repair Based on Inferred Specifications
    Klieber, William
    Snavely, Will
    2016 IEEE CYBERSECURITY DEVELOPMENT (IEEE SECDEV 2016), 2016, : 130 - 137