A controlled experiment of different code representations for learning-based program repair

被引：8

作者：

Namavar, Marjane ^{[1
]}

Nashid, Noor ^{[1
]}

Mesbah, Ali ^{[1
]}

机构：

[1] Univ British Columbia, Vancouver, BC, Canada

来源：

EMPIRICAL SOFTWARE ENGINEERING | 2022年 / 27卷 / 07期

关键词：

Program repair; Deep learning; Code representation; Controlled experiment; TOOL;

D O I：

10.1007/s10664-022-10223-5

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Training a deep learning model on source code has gained significant traction recently. Since such models reason about vectors of numbers, source code needs to be converted to a code representation before vectorization. Numerous approaches have been proposed to represent source code, from sequences of tokens to abstract syntax trees. However, there is no systematic study to understand the effect of code representation on learning performance. Through a controlled experiment, we examine the impact of various code representations on model accuracy and usefulness in deep learning-based program repair. We train 21 different generative models that suggest fixes for name-based bugs, including 14 different homogeneous code representations, four mixed representations for the buggy and fixed code, and three different embeddings. We assess if fix suggestions produced by the model in various code representations are automatically patchable, meaning they can be transformed to a valid code that is ready to be applied to the buggy code to fix it. We also conduct a developer study to qualitatively evaluate the usefulness of inferred fixes in different code representations. Our results highlight the importance of code representation and its impact on learning and usefulness. Our findings indicate that (1) while code abstractions help the learning process, they can adversely impact the usefulness of inferred fixes from a developer's point of view; this emphasizes the need to look at the patches generated from the practitioner's perspective, which is often neglected in the literature, (2) mixed representations can outperform homogeneous code representations, (3) bug type can affect the effectiveness of different code representations; although current techniques use a single code representation for all bug types, there is no single best code representation applicable to all bug types.

引用

页数：39

共 50 条

[31] QRnet: fast learning-based QR code image embedding
Pena-Pena, Karelia
Lau, Daniel L.
Arce, Andrew J.
Arce, Gonzalo R.
MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (08) : 10653 - 10672
[32] Evaluating Representation Learning of Code Changes for Predicting Patch Correctness in Program Repair
Tian, Haoye
Liu, Kui
Kabore, Abdoul Kader
Koyuncu, Anil
Li, Li
Klein, Jacques
Bissyande, Tegawende F.
2020 35TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING (ASE 2020), 2020, : 981 - 992
[33] Sorting and Transforming Program Repair Ingredients via Deep Learning Code Similarities
White, Martin
Tufano, Michele
Martinez, Matias
Monperrus, Martin
Poshyvanyk, Denys
2019 IEEE 26TH INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING (SANER), 2019, : 479 - 490
[34] The Impact of Code Bloat on Genetic Program Comprehension: Replication of a Controlled Experiment on Semantic Inference
Kosar, Tomaz
Kovacevic, Zeljko
Mernik, Marjan
Slivnik, Bostjan
MATHEMATICS, 2023, 11 (17)
[35] WasmWalker: Path-based Code Representations for Improved WebAssembly Program Analysis
Shirzad, Mohammad Robati
Lam, Patrick
arXiv,
[36] Deep Learning-Based Acoustic Feature Representations for Dysarthric Speech Recognition
Latha M.
Shivakumar M.
Manjula G.
Hemakumar M.
Kumar M.K.
SN Computer Science, 4 (3)
[37] Omics Data and Data Representations for Deep Learning-Based Predictive Modeling
Tsimenidis, Stefanos
Vrochidou, Eleni
Papakostas, George A.
INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2022, 23 (20)
[38] Environment Representations of Railway Infrastructure for Reinforcement Learning-Based Traffic Control
Lovetei, Istvan
Kovari, Balint
Becsi, Tamas
Aradi, Szilard
APPLIED SCIENCES-BASEL, 2022, 12 (09):
[39] Compiler-Based Graph Representations for Deep Learning Models of Code
Brauckmann, Alexander
Goens, Andres
Ertel, Sebastian
Castrillon, Jeronimo
PROCEEDINGS OF THE 29TH INTERNATIONAL CONFERENCE ON COMPILER CONSTRUCTION (CC '20), 2020, : 201 - 211
[40] Learning Code Representations Using Multifractal-based Graph Networks
Ma, Guixiang
Xiao, Yao
Capota, Mihai
Willke, Theodore L.
Nazarian, Shahin
Bogdan, Paul
Ahmed, Nesreen K.
2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 1858 - 1866

← 1 2 3 4 5 →