A controlled experiment of different code representations for learning-based program repair

被引:8
|
作者
Namavar, Marjane [1 ]
Nashid, Noor [1 ]
Mesbah, Ali [1 ]
机构
[1] Univ British Columbia, Vancouver, BC, Canada
关键词
Program repair; Deep learning; Code representation; Controlled experiment; TOOL;
D O I
10.1007/s10664-022-10223-5
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Training a deep learning model on source code has gained significant traction recently. Since such models reason about vectors of numbers, source code needs to be converted to a code representation before vectorization. Numerous approaches have been proposed to represent source code, from sequences of tokens to abstract syntax trees. However, there is no systematic study to understand the effect of code representation on learning performance. Through a controlled experiment, we examine the impact of various code representations on model accuracy and usefulness in deep learning-based program repair. We train 21 different generative models that suggest fixes for name-based bugs, including 14 different homogeneous code representations, four mixed representations for the buggy and fixed code, and three different embeddings. We assess if fix suggestions produced by the model in various code representations are automatically patchable, meaning they can be transformed to a valid code that is ready to be applied to the buggy code to fix it. We also conduct a developer study to qualitatively evaluate the usefulness of inferred fixes in different code representations. Our results highlight the importance of code representation and its impact on learning and usefulness. Our findings indicate that (1) while code abstractions help the learning process, they can adversely impact the usefulness of inferred fixes from a developer's point of view; this emphasizes the need to look at the patches generated from the practitioner's perspective, which is often neglected in the literature, (2) mixed representations can outperform homogeneous code representations, (3) bug type can affect the effectiveness of different code representations; although current techniques use a single code representation for all bug types, there is no single best code representation applicable to all bug types.
引用
收藏
页数:39
相关论文
共 50 条
  • [1] A controlled experiment of different code representations for learning-based program repair
    Marjane Namavar
    Noor Nashid
    Ali Mesbah
    Empirical Software Engineering, 2022, 27
  • [2] Doctor Code: A machine learning-based approach to program repair
    Moosavi, Sh
    Vahidi-Asl, M.
    Haghighi, H.
    Rezaalipour, M.
    Scientia Iranica, 2024, 31 (02) : 83 - 102
  • [3] A Survey of Learning-based Automated Program Repair
    Zhang, Quanjun
    Fang, Chunrong
    Ma, Yuxiang
    Sun, Weisong
    Chen, Zhenyu
    ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2024, 33 (02)
  • [4] DEAR: A Novel Deep Learning-based Approach for Automated Program Repair
    Li, Yi
    Wang, Shaohua
    Nguyen, Tien N.
    2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2022), 2022, : 511 - 523
  • [5] Learning Program Semantics with Code Representations: An Empirical Study
    Siow, Jing Kai
    Liu, Shangqing
    Xie, Xiaofei
    Meng, Guozhu
    Liu, Yang
    2022 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING (SANER 2022), 2022, : 554 - 565
  • [6] An Extensive Study on Model Architecture and Program Representation in the Domain of Learning-based Automated Program Repair
    Horvath, Daniel
    Csuvik, Viktor
    Gyimothy, Tibor
    Vidacs, Laszlo
    2023 IEEE/ACM INTERNATIONAL WORKSHOP ON AUTOMATED PROGRAM REPAIR, APR, 2023, : 31 - 38
  • [7] Impact of Defect Instances for Successful Deep Learning-based Automatic Program Repair
    Kim, Misoo
    Kim, Youngkyoung
    Heo, Jinseok
    Jeong, Hohyeon
    Kim, Sungoh
    Lee, Eunseok
    2022 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME 2022), 2022, : 419 - 423
  • [8] An Empirical Study of Deep Transfer Learning-Based Program Repair for Kotlin Projects
    Kim, Misoo
    Kim, Youngkyoung
    Jeong, Hohyeon
    Heo, Jinseok
    Kim, Sungoh
    Chung, Hyunhee
    Lee, Eunseok
    PROCEEDINGS OF THE 30TH ACM JOINT MEETING EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, ESEC/FSE 2022, 2022, : 1441 - 1452
  • [9] Deep Learning-Based Program-Wide Binary Code Similarity for Smart Contracts
    Zhuang, Yuan
    Wang, Baobao
    Sun, Jianguo
    Liu, Haoyang
    Yang, Shuqi
    Ma, Qingan
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 74 (01): : 1011 - 1024