Test-Driven Multi-Task Learning with Functionally Equivalent Code Transformation for Neural Code Generation

被引:0
|
作者
Wang, Xin [1 ]
Liu, Xiao [2 ]
Zhou, Pingyi [3 ]
Liu, Qixia [4 ]
Liu, Jin [1 ]
Wu, Hao [5 ]
Cui, Xiaohui [6 ]
机构
[1] Wuhan Univ, Sch Comp Sci, Wuhan, Peoples R China
[2] Deakin Univ, Sch Informat Technol, Geelong, Vic, Australia
[3] Huawei Technol, Noahs Ark Lab, Shenzhen, Peoples R China
[4] China Mobile Commun Corp, Suzhou, Peoples R China
[5] Yunnan Univ, Sch Informat Sci & Engn, Kunming, Peoples R China
[6] Wuhan Univ, Sch Cyber Sci & Engn, Wuhan, Peoples R China
基金
中国国家自然科学基金;
关键词
Neural Code Generation; Program Analysis; Execution Feedback; Code Transformation; Multi-Task Learning;
D O I
10.1145/3551349.3559549
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Automated code generation is a longstanding challenge in both communities of software engineering and artificial intelligence. Currently, some works have started to investigate the functional correctness of code generation, where a code snippet is considered correct if it passes a set of test cases. However, most existing works still model code generation as text generation without considering program-specific information, such as functionally equivalent code snippets and test execution feedback. To address the above limitations, this paper proposes a method combining program analysis with deep learning for neural code generation, where functionally equivalent code snippets and test execution feedback will be considered at the training stage. Concretely, we firstly design several code transformation heuristics to produce different variants of the code snippet satisfying the same functionality. In addition, we employ the test execution feedback and design a test-driven discriminative task to train a novel discriminator, aiming to let the model distinguish whether the generated code is correct or not. The preliminary results on a newly published dataset demonstrate the effectiveness of our proposed framework for code generation. Particularly, in terms of the pass@1 metric, we achieve 8.81 and 11.53 gains compared with CodeGPT and CodeT5, respectively.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] A Self-Attentional Neural Architecture for Code Completion with Multi-Task Learning
    Liu, Fang
    Li, Ge
    Wei, Bolin
    Xia, Xin
    Fu, Zhiyi
    Jin, Zhi
    2020 IEEE/ACM 28TH INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION, ICPC, 2020, : 37 - 47
  • [2] Test-Driven Code Review: An Empirical Study
    Spadini, Davide
    Palomba, Fabio
    Baum, Tobias
    Hanenberg, Stefan
    Bruntink, Magiel
    Bacchelli, Alberto
    2019 IEEE/ACM 41ST INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2019), 2019, : 1061 - 1072
  • [3] The effect of test-driven development on program code
    Mueller, Matthias M.
    EXTREME PROGRAMMING AND AGILE PROCESSES IN SOFTWARE ENGINEERING, PROCEEDINGS, 2006, 4044 : 94 - 103
  • [4] MulCode: A Multi-task Learning Approach for Source Code Understanding
    Wang, Deze
    Yu, Yue
    Li, Shanshan
    Dong, Wei
    Wang, Ji
    Qing, Liao
    2021 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING (SANER 2021), 2021, : 48 - 59
  • [5] CoTexT: Multi-task Learning with Code-Text Transformer
    Long Phan
    Hieu Tran
    Le, Daniel
    Hieu Nguyen
    Anibal, James
    Peltekian, Alec
    Ye, Yanfang
    NLP4PROG 2021: THE 1ST WORKSHOP ON NATURAL LANGUAGE PROCESSING FOR PROGRAMMING (NLP4PROG 2021), 2021, : 40 - 47
  • [6] A syntax-guided multi-task learning approach for Turducken-style code generation
    Guang Yang
    Yu Zhou
    Xiang Chen
    Xiangyu Zhang
    Yiran Xu
    Tingting Han
    Taolue Chen
    Empirical Software Engineering, 2023, 28
  • [7] A syntax-guided multi-task learning approach for Turducken-style code generation
    Yang, Guang
    Zhou, Yu
    Chen, Xiang
    Zhang, Xiangyu
    Xu, Yiran
    Han, Tingting
    Chen, Taolue
    EMPIRICAL SOFTWARE ENGINEERING, 2023, 28 (06)
  • [8] LLM-Based Test-Driven Interactive Code Generation: User Study and Empirical Evaluation
    Fakhoury, Sarah
    Naik, Aaditya
    Sakkas, Georgios
    Chakraborty, Saikat
    Lahiri, Shuvendu K.
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2024, 50 (09) : 2254 - 2268
  • [9] AdapterFusion-based multi-task learning for code-mixed and code-switched text classification
    Rathnayake, Himashi
    Sumanapala, Janani
    Rukshani, Raveesha
    Ranathunga, Surangika
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 127
  • [10] Neural Comment Generation for Source Code with Auxiliary Code Classification Task
    Chen, Minghao
    Wan, Xiaojun
    2019 26TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE (APSEC), 2019, : 522 - 529