A critical review on the evaluation of automated program repair systems

被引:48
|
作者
Liu, Kui [1 ]
Li, Li [2 ]
Koyuncu, Anil [3 ]
Kim, Dongsun [4 ]
Liu, Zhe [1 ]
Klein, Jacques [3 ]
Bissyande, Tegawende F. [3 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing, Peoples R China
[2] Monash Univ, Fac Informat Technol, Clayton, Vic, Australia
[3] Univ Luxembourg, Interdisciplinary Ctr Secur Reliabil & Trust SnT, Luxembourg, Luxembourg
[4] Kyungpook Natl Univ, Sch Comp Sci & Engn, Daegu, South Korea
基金
中国国家自然科学基金;
关键词
Automated program repair; Evaluation; Assessment; Metrics;
D O I
10.1016/j.jss.2020.110817
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Automated Program Repair (APR) has attracted significant attention from software engineering re-search and practice communities in the last decade. Several teams have recorded promising performance in fixing real bugs and there is a race in the literature to fix as many bugs as possible from established benchmarks. Gradually, repair performance of APR tools in the literature has gone from being evaluated with a metric on the number of generated plausible patches to the number of correct patches. This evolution is necessary after a study highlighting the overfitting issue in test suite-based automatic patch generation. Simultaneously, some researchers are also insisting on providing time cost in the repair scenario as a metric for comparing state-of-the-art systems. In this paper, we discuss how the latest evaluation metrics of APR systems could be biased. Since design decisions (both in approach and evaluation setup) are not always fully disclosed, the impact on repair performance is unknown and computed metrics are often misleading. To reduce notable biases of design decisions in program repair approaches, we conduct a critical review on the evaluation of patch generation systems and propose eight evaluation metrics for fairly assessing the performance of APR tools. Eventually, we show with experimental data on 11 baseline program repair systems that the proposed metrics allow to highlight some caveats in the literature. We expect wide adoption of these metrics in the community to contribute to boosting the development of practical, and reliably performable program repair tools. (c) 2020 Elsevier Inc. All rights reserved.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] On the Evaluation Metrics of Automated Program Repair
    Qi, Yuhua
    Liu, Wenhong
    Zhang, Weixiang
    Yang, Deheng
    2017 FOURTH INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND THEIR APPLICATIONS (DSA 2017), 2017, : 168 - 168
  • [2] Advancements in automated program repair: a comprehensive review
    Dikici, Sena
    Bilgin, Turgay Tugay
    KNOWLEDGE AND INFORMATION SYSTEMS, 2025,
  • [3] Automated Program Repair
    Le Goues, Claire
    Pradel, Michael
    Roychoudhury, Abhik
    COMMUNICATIONS OF THE ACM, 2019, 62 (12) : 56 - 65
  • [4] A critical review of Program and Project Evaluation Models
    Linzalone, Roberto
    Schiuma, Giovanni
    IFKAD 2014: 9TH INTERNATIONAL FORUM ON KNOWLEDGE ASSET DYNAMICS: KNOWLEDGE AND MANAGEMENT MODELS FOR SUSTAINABLE GROWTH, 2014, : 2839 - 2847
  • [5] Automated program repair for variability bugs in software product line systems
    Nguyen, Thu-Trang
    Zhang, Xiao-Yi
    Arcaini, Paolo
    Ishikawa, Fuyuki
    Vo, Hieu Dinh
    JOURNAL OF SYSTEMS AND SOFTWARE, 2025, 221
  • [6] A systematic review of automated writing evaluation systems
    Shi Huawei
    Vahid Aryadoust
    Education and Information Technologies, 2023, 28 : 771 - 795
  • [7] A systematic review of automated writing evaluation systems
    Huawei, Shi
    Aryadoust, Vahid
    EDUCATION AND INFORMATION TECHNOLOGIES, 2023, 28 (01) : 771 - 795
  • [8] ExpressAPR: Efficient Patch Validation for Java']Java Automated Program Repair Systems
    Xiao, Yuan-An
    Yang, Chenyang
    Wang, Bo
    Xiong, Yingfei
    2023 38TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, ASE, 2023, : 2038 - 2041
  • [9] Automated Approaches for Program Verification and Repair
    Hallahan, William T.
    ProQuest Dissertations and Theses Global, 2022,
  • [10] Correction to: A systematic review of automated writing evaluation systems
    Huawei Shi
    Vahid Aryadoust
    Education and Information Technologies, 2023, 28 (5) : 6189 - 6190