A controlled experiment of different code representations for learning-based program repair

被引:8
|
作者
Namavar, Marjane [1 ]
Nashid, Noor [1 ]
Mesbah, Ali [1 ]
机构
[1] Univ British Columbia, Vancouver, BC, Canada
关键词
Program repair; Deep learning; Code representation; Controlled experiment; TOOL;
D O I
10.1007/s10664-022-10223-5
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Training a deep learning model on source code has gained significant traction recently. Since such models reason about vectors of numbers, source code needs to be converted to a code representation before vectorization. Numerous approaches have been proposed to represent source code, from sequences of tokens to abstract syntax trees. However, there is no systematic study to understand the effect of code representation on learning performance. Through a controlled experiment, we examine the impact of various code representations on model accuracy and usefulness in deep learning-based program repair. We train 21 different generative models that suggest fixes for name-based bugs, including 14 different homogeneous code representations, four mixed representations for the buggy and fixed code, and three different embeddings. We assess if fix suggestions produced by the model in various code representations are automatically patchable, meaning they can be transformed to a valid code that is ready to be applied to the buggy code to fix it. We also conduct a developer study to qualitatively evaluate the usefulness of inferred fixes in different code representations. Our results highlight the importance of code representation and its impact on learning and usefulness. Our findings indicate that (1) while code abstractions help the learning process, they can adversely impact the usefulness of inferred fixes from a developer's point of view; this emphasizes the need to look at the patches generated from the practitioner's perspective, which is often neglected in the literature, (2) mixed representations can outperform homogeneous code representations, (3) bug type can affect the effectiveness of different code representations; although current techniques use a single code representation for all bug types, there is no single best code representation applicable to all bug types.
引用
收藏
页数:39
相关论文
共 50 条
  • [31] QRnet: fast learning-based QR code image embedding
    Pena-Pena, Karelia
    Lau, Daniel L.
    Arce, Andrew J.
    Arce, Gonzalo R.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (08) : 10653 - 10672
  • [32] Evaluating Representation Learning of Code Changes for Predicting Patch Correctness in Program Repair
    Tian, Haoye
    Liu, Kui
    Kabore, Abdoul Kader
    Koyuncu, Anil
    Li, Li
    Klein, Jacques
    Bissyande, Tegawende F.
    2020 35TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING (ASE 2020), 2020, : 981 - 992
  • [33] Sorting and Transforming Program Repair Ingredients via Deep Learning Code Similarities
    White, Martin
    Tufano, Michele
    Martinez, Matias
    Monperrus, Martin
    Poshyvanyk, Denys
    2019 IEEE 26TH INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING (SANER), 2019, : 479 - 490
  • [34] The Impact of Code Bloat on Genetic Program Comprehension: Replication of a Controlled Experiment on Semantic Inference
    Kosar, Tomaz
    Kovacevic, Zeljko
    Mernik, Marjan
    Slivnik, Bostjan
    MATHEMATICS, 2023, 11 (17)
  • [35] WasmWalker: Path-based Code Representations for Improved WebAssembly Program Analysis
    Shirzad, Mohammad Robati
    Lam, Patrick
    arXiv,
  • [36] Deep Learning-Based Acoustic Feature Representations for Dysarthric Speech Recognition
    Latha M.
    Shivakumar M.
    Manjula G.
    Hemakumar M.
    Kumar M.K.
    SN Computer Science, 4 (3)
  • [37] Omics Data and Data Representations for Deep Learning-Based Predictive Modeling
    Tsimenidis, Stefanos
    Vrochidou, Eleni
    Papakostas, George A.
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2022, 23 (20)
  • [38] Environment Representations of Railway Infrastructure for Reinforcement Learning-Based Traffic Control
    Lovetei, Istvan
    Kovari, Balint
    Becsi, Tamas
    Aradi, Szilard
    APPLIED SCIENCES-BASEL, 2022, 12 (09):
  • [39] Compiler-Based Graph Representations for Deep Learning Models of Code
    Brauckmann, Alexander
    Goens, Andres
    Ertel, Sebastian
    Castrillon, Jeronimo
    PROCEEDINGS OF THE 29TH INTERNATIONAL CONFERENCE ON COMPILER CONSTRUCTION (CC '20), 2020, : 201 - 211
  • [40] Learning Code Representations Using Multifractal-based Graph Networks
    Ma, Guixiang
    Xiao, Yao
    Capota, Mihai
    Willke, Theodore L.
    Nazarian, Shahin
    Bogdan, Paul
    Ahmed, Nesreen K.
    2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 1858 - 1866