A critical review on the evaluation of automated program repair systems

被引:48
|
作者
Liu, Kui [1 ]
Li, Li [2 ]
Koyuncu, Anil [3 ]
Kim, Dongsun [4 ]
Liu, Zhe [1 ]
Klein, Jacques [3 ]
Bissyande, Tegawende F. [3 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing, Peoples R China
[2] Monash Univ, Fac Informat Technol, Clayton, Vic, Australia
[3] Univ Luxembourg, Interdisciplinary Ctr Secur Reliabil & Trust SnT, Luxembourg, Luxembourg
[4] Kyungpook Natl Univ, Sch Comp Sci & Engn, Daegu, South Korea
基金
中国国家自然科学基金;
关键词
Automated program repair; Evaluation; Assessment; Metrics;
D O I
10.1016/j.jss.2020.110817
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Automated Program Repair (APR) has attracted significant attention from software engineering re-search and practice communities in the last decade. Several teams have recorded promising performance in fixing real bugs and there is a race in the literature to fix as many bugs as possible from established benchmarks. Gradually, repair performance of APR tools in the literature has gone from being evaluated with a metric on the number of generated plausible patches to the number of correct patches. This evolution is necessary after a study highlighting the overfitting issue in test suite-based automatic patch generation. Simultaneously, some researchers are also insisting on providing time cost in the repair scenario as a metric for comparing state-of-the-art systems. In this paper, we discuss how the latest evaluation metrics of APR systems could be biased. Since design decisions (both in approach and evaluation setup) are not always fully disclosed, the impact on repair performance is unknown and computed metrics are often misleading. To reduce notable biases of design decisions in program repair approaches, we conduct a critical review on the evaluation of patch generation systems and propose eight evaluation metrics for fairly assessing the performance of APR tools. Eventually, we show with experimental data on 11 baseline program repair systems that the proposed metrics allow to highlight some caveats in the literature. We expect wide adoption of these metrics in the community to contribute to boosting the development of practical, and reliably performable program repair tools. (c) 2020 Elsevier Inc. All rights reserved.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] Travel time models in automated storage retrieval systems: A critical review
    Sarker, BR
    Babu, PS
    INTERNATIONAL JOURNAL OF PRODUCTION ECONOMICS, 1995, 40 (2-3) : 173 - 184
  • [22] Enhancing Automated Program Repair with Deductive Verification
    Le, Xuan-Bach D.
    Le, Quang Loc
    Lo, David
    Le Goues, Claire
    32ND IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME 2016), 2016, : 428 - 432
  • [23] Automated Program Repair in an Integrated Development Environment
    Pei, Yu
    Furia, Carlo A.
    Nordio, Martin
    Meyer, Bertrand
    2015 IEEE/ACM 37TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, VOL 2, 2015, : 681 - 684
  • [24] Large Language Models for Automated Program Repair
    Ribeiro, Francisco
    COMPANION PROCEEDINGS OF THE 2023 ACM SIGPLAN INTERNATIONAL CONFERENCE ON SYSTEMS, PROGRAMMING, LANGUAGES, AND APPLICATIONS: SOFTWARE FOR HUMANITY, SPLASH COMPANION 2023, 2023, : 7 - 9
  • [25] How to Measure the Performance of Automated Program Repair?
    Qi, Yuhua
    Liu, Wenhong
    Zhang, Weixiang
    Yang, Deheng
    2018 5TH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND CONTROL ENGINEERING (ICISCE 2018), 2018, : 246 - 250
  • [26] Automated patch assessment for program repair at scale
    Ye, He
    Martinez, Matias
    Monperrus, Martin
    EMPIRICAL SOFTWARE ENGINEERING, 2021, 26 (02)
  • [27] Adversarial patch generation for automated program repair
    Alhefdhi, Abdulaziz
    Dam, Hoa Khanh
    Le-Cong, Thanh
    Le, Bach
    Ghose, Aditya
    SOFTWARE QUALITY JOURNAL, 2025, 33 (01)
  • [28] The Impact of Search Algorithms in Automated Program Repair
    Assiri, Fatmah Yousef
    Bieman, James M.
    PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND SOFTWARE ENGINEERING (SCSE'15), 2015, 62 : 65 - 72
  • [29] High-Quality Automated Program Repair
    Motwani, Manish
    2021 IEEE/ACM 43RD INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: COMPANION PROCEEDINGS (ICSE-COMPANION 2021), 2021, : 309 - 314
  • [30] Static Automated Program Repair for Heap Properties
    van Tonder, Rijnard
    Le Goues, Claire
    PROCEEDINGS 2018 IEEE/ACM 40TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE), 2018, : 151 - 162