A controlled experiment of different code representations for learning-based program repair

被引：8

作者：

Namavar, Marjane ^{[1
]}

Nashid, Noor ^{[1
]}

Mesbah, Ali ^{[1
]}

机构：

[1] Univ British Columbia, Vancouver, BC, Canada

来源：

EMPIRICAL SOFTWARE ENGINEERING | 2022年 / 27卷 / 07期

关键词：

Program repair; Deep learning; Code representation; Controlled experiment; TOOL;

D O I：

10.1007/s10664-022-10223-5

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Training a deep learning model on source code has gained significant traction recently. Since such models reason about vectors of numbers, source code needs to be converted to a code representation before vectorization. Numerous approaches have been proposed to represent source code, from sequences of tokens to abstract syntax trees. However, there is no systematic study to understand the effect of code representation on learning performance. Through a controlled experiment, we examine the impact of various code representations on model accuracy and usefulness in deep learning-based program repair. We train 21 different generative models that suggest fixes for name-based bugs, including 14 different homogeneous code representations, four mixed representations for the buggy and fixed code, and three different embeddings. We assess if fix suggestions produced by the model in various code representations are automatically patchable, meaning they can be transformed to a valid code that is ready to be applied to the buggy code to fix it. We also conduct a developer study to qualitatively evaluate the usefulness of inferred fixes in different code representations. Our results highlight the importance of code representation and its impact on learning and usefulness. Our findings indicate that (1) while code abstractions help the learning process, they can adversely impact the usefulness of inferred fixes from a developer's point of view; this emphasizes the need to look at the patches generated from the practitioner's perspective, which is often neglected in the literature, (2) mixed representations can outperform homogeneous code representations, (3) bug type can affect the effectiveness of different code representations; although current techniques use a single code representation for all bug types, there is no single best code representation applicable to all bug types.

引用

页数：39

共 50 条

[1] A controlled experiment of different code representations for learning-based program repair
Marjane Namavar
Noor Nashid
Ali Mesbah
Empirical Software Engineering, 2022, 27
[2] Doctor Code: A machine learning-based approach to program repair
Moosavi, Sh
Vahidi-Asl, M.
Haghighi, H.
Rezaalipour, M.
Scientia Iranica, 2024, 31 (02) : 83 - 102
[3] A Survey of Learning-based Automated Program Repair
Zhang, Quanjun
Fang, Chunrong
Ma, Yuxiang
Sun, Weisong
Chen, Zhenyu
ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2024, 33 (02)
[4] DEAR: A Novel Deep Learning-based Approach for Automated Program Repair
Li, Yi
Wang, Shaohua
Nguyen, Tien N.
2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2022), 2022, : 511 - 523
[5] Learning Program Semantics with Code Representations: An Empirical Study
Siow, Jing Kai
Liu, Shangqing
Xie, Xiaofei
Meng, Guozhu
Liu, Yang
2022 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING (SANER 2022), 2022, : 554 - 565
[6] An Extensive Study on Model Architecture and Program Representation in the Domain of Learning-based Automated Program Repair
Horvath, Daniel
Csuvik, Viktor
Gyimothy, Tibor
Vidacs, Laszlo
2023 IEEE/ACM INTERNATIONAL WORKSHOP ON AUTOMATED PROGRAM REPAIR, APR, 2023, : 31 - 38
[7] Impact of Defect Instances for Successful Deep Learning-based Automatic Program Repair
Kim, Misoo
Kim, Youngkyoung
Heo, Jinseok
Jeong, Hohyeon
Kim, Sungoh
Lee, Eunseok
2022 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME 2022), 2022, : 419 - 423
[8] An Empirical Study of Deep Transfer Learning-Based Program Repair for Kotlin Projects
Kim, Misoo
Kim, Youngkyoung
Jeong, Hohyeon
Heo, Jinseok
Kim, Sungoh
Chung, Hyunhee
Lee, Eunseok
PROCEEDINGS OF THE 30TH ACM JOINT MEETING EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, ESEC/FSE 2022, 2022, : 1441 - 1452
[9] Deep Learning-Based Program-Wide Binary Code Similarity for Smart Contracts
Zhuang, Yuan
Wang, Baobao
Sun, Jianguo
Liu, Haoyang
Yang, Shuqi
Ma, Qingan
CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 74 (01): : 1011 - 1024
[10] CONTROLLED EXPERIMENT IN PROGRAM TESTING AND CODE WALKTHROUGHS-INSPECTIONS
MYERS, GJ
COMMUNICATIONS OF THE ACM, 1978, 21 (09) : 760 - 768

← 1 2 3 4 5 →