A critical review on the evaluation of automated program repair systems

被引：48

作者：

Liu, Kui ^{[1
]}

Li, Li ^{[2
]}

Koyuncu, Anil ^{[3
]}

Kim, Dongsun ^{[4
]}

Liu, Zhe ^{[1
]}

Klein, Jacques ^{[3
]}

Bissyande, Tegawende F. ^{[3
]}

机构：

[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing, Peoples R China

[2] Monash Univ, Fac Informat Technol, Clayton, Vic, Australia

[3] Univ Luxembourg, Interdisciplinary Ctr Secur Reliabil & Trust SnT, Luxembourg, Luxembourg

[4] Kyungpook Natl Univ, Sch Comp Sci & Engn, Daegu, South Korea

来源：

JOURNAL OF SYSTEMS AND SOFTWARE | 2021年 / 171卷

基金：

中国国家自然科学基金;

关键词：

Automated program repair; Evaluation; Assessment; Metrics;

D O I：

10.1016/j.jss.2020.110817

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Automated Program Repair (APR) has attracted significant attention from software engineering re-search and practice communities in the last decade. Several teams have recorded promising performance in fixing real bugs and there is a race in the literature to fix as many bugs as possible from established benchmarks. Gradually, repair performance of APR tools in the literature has gone from being evaluated with a metric on the number of generated plausible patches to the number of correct patches. This evolution is necessary after a study highlighting the overfitting issue in test suite-based automatic patch generation. Simultaneously, some researchers are also insisting on providing time cost in the repair scenario as a metric for comparing state-of-the-art systems. In this paper, we discuss how the latest evaluation metrics of APR systems could be biased. Since design decisions (both in approach and evaluation setup) are not always fully disclosed, the impact on repair performance is unknown and computed metrics are often misleading. To reduce notable biases of design decisions in program repair approaches, we conduct a critical review on the evaluation of patch generation systems and propose eight evaluation metrics for fairly assessing the performance of APR tools. Eventually, we show with experimental data on 11 baseline program repair systems that the proposed metrics allow to highlight some caveats in the literature. We expect wide adoption of these metrics in the community to contribute to boosting the development of practical, and reliably performable program repair tools. (c) 2020 Elsevier Inc. All rights reserved.

引用

页数：13

共 50 条

[21] Travel time models in automated storage retrieval systems: A critical review
Sarker, BR
Babu, PS
INTERNATIONAL JOURNAL OF PRODUCTION ECONOMICS, 1995, 40 (2-3) : 173 - 184
[22] Enhancing Automated Program Repair with Deductive Verification
Le, Xuan-Bach D.
Le, Quang Loc
Lo, David
Le Goues, Claire
32ND IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME 2016), 2016, : 428 - 432
[23] Automated Program Repair in an Integrated Development Environment
Pei, Yu
Furia, Carlo A.
Nordio, Martin
Meyer, Bertrand
2015 IEEE/ACM 37TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, VOL 2, 2015, : 681 - 684
[24] Large Language Models for Automated Program Repair
Ribeiro, Francisco
COMPANION PROCEEDINGS OF THE 2023 ACM SIGPLAN INTERNATIONAL CONFERENCE ON SYSTEMS, PROGRAMMING, LANGUAGES, AND APPLICATIONS: SOFTWARE FOR HUMANITY, SPLASH COMPANION 2023, 2023, : 7 - 9
[25] How to Measure the Performance of Automated Program Repair?
Qi, Yuhua
Liu, Wenhong
Zhang, Weixiang
Yang, Deheng
2018 5TH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND CONTROL ENGINEERING (ICISCE 2018), 2018, : 246 - 250
[26] Automated patch assessment for program repair at scale
Ye, He
Martinez, Matias
Monperrus, Martin
EMPIRICAL SOFTWARE ENGINEERING, 2021, 26 (02)
[27] Adversarial patch generation for automated program repair
Alhefdhi, Abdulaziz
Dam, Hoa Khanh
Le-Cong, Thanh
Le, Bach
Ghose, Aditya
SOFTWARE QUALITY JOURNAL, 2025, 33 (01)
[28] The Impact of Search Algorithms in Automated Program Repair
Assiri, Fatmah Yousef
Bieman, James M.
PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND SOFTWARE ENGINEERING (SCSE'15), 2015, 62 : 65 - 72
[29] High-Quality Automated Program Repair
Motwani, Manish
2021 IEEE/ACM 43RD INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: COMPANION PROCEEDINGS (ICSE-COMPANION 2021), 2021, : 309 - 314
[30] Static Automated Program Repair for Heap Properties
van Tonder, Rijnard
Le Goues, Claire
PROCEEDINGS 2018 IEEE/ACM 40TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE), 2018, : 151 - 162

← 1 2 3 4 5 →