A syntax-guided multi-task learning approach for Turducken-style code generation

被引:2
|
作者
Yang, Guang [1 ]
Zhou, Yu [1 ]
Chen, Xiang [2 ]
Zhang, Xiangyu [1 ]
Xu, Yiran [1 ]
Han, Tingting [3 ]
Chen, Taolue [3 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing, Peoples R China
[2] Nantong Univ, Sch Informat Sci & Technol, Nantong, Peoples R China
[3] Birkbeck Univ London, Dept Comp Sci, London, England
基金
中国国家自然科学基金;
关键词
Syntactically-constrained code generation; Turducken-style code; Multi-task learning; CodeT5; Abstract syntax tree;
D O I
10.1007/s10664-023-10372-1
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Due to the development of pre-trained language models, automated code generation techniques have shown great promise in recent years. However, the generated code will not always adhere to syntactic constraints of the target language, especially in the case of Turducken-style code, where declarative code snippets are embedded within imperative programs. In this study, we summarize three significant challenges in regards to syntactic constraints: (1) the efficient representation of syntactic constraints, (2) the effective integration of syntactic information, and (3) the scalable syntax-first decoding algorithm. To address these challenges, we propose a syntax-guided multi-task learning approach TurduckenGen. Specifically, we first explicitly append the type information to the code tokens to capture the representation of syntactic constraints. Then we formalize code generation with syntactic constraint representation as an auxiliary task to enable the model to learn the syntactic constraints of the code. Finally, the syntactically correct code is selected accurately from the multiple candidates with the help of the compiler feedback. Extensive experiments and comprehensive analysis demonstrate the effectiveness and general applicability of our approach after being compared with six state-of-the-art baselines on two Turducken-style code datasets. Finally, we conducted a human study and found the code quality generated by our approach is better than baselines in terms of code readability and semantic similarity.
引用
收藏
页数:35
相关论文
共 50 条
  • [1] A syntax-guided multi-task learning approach for Turducken-style code generation
    Guang Yang
    Yu Zhou
    Xiang Chen
    Xiangyu Zhang
    Yiran Xu
    Tingting Han
    Taolue Chen
    Empirical Software Engineering, 2023, 28
  • [2] An Adversarial Approach for Unsupervised Syntax-Guided Paraphrase Generation
    Xue, Tang
    Zhao, Yuran
    Liu, Gongshen
    Li, Xiaoyong
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2022, PT I, 2022, 13551 : 364 - 376
  • [3] Syntax-guided question generation using prompt learning
    Hou, Zheheng
    Bi, Sheng
    Qi, Guilin
    Zheng, Yuanchun
    Ren, Zuomin
    Li, Yun
    NEURAL COMPUTING & APPLICATIONS, 2024, 36 (12): : 6271 - 6282
  • [4] Syntax-guided question generation using prompt learning
    Zheheng Hou
    Sheng Bi
    Guilin Qi
    Yuanchun Zheng
    Zuomin Ren
    Yun Li
    Neural Computing and Applications, 2024, 36 : 6271 - 6282
  • [5] Reinforcement Learning and Data-Generation for Syntax-Guided Synthesis
    Parsert, Julian
    Polgreen, Elizabeth
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 9, 2024, : 10670 - 10678
  • [6] Syntax-controlled paraphrases generation with VAE and multi-task learning
    Jia, Xiyuan
    Mao, Zongqing
    Zhang, Zhen
    Lv, Qiyun
    Wang, Xin
    Wu, Guohua
    COMPUTER SPEECH AND LANGUAGE, 2025, 89
  • [7] MulCode: A Multi-task Learning Approach for Source Code Understanding
    Wang, Deze
    Yu, Yue
    Li, Shanshan
    Dong, Wei
    Wang, Ji
    Qing, Liao
    2021 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING (SANER 2021), 2021, : 48 - 59
  • [8] Granular Syntax Processing with Multi-Task and Curriculum Learning
    Zhang, Xulang
    Mao, Rui
    Cambria, Erik
    COGNITIVE COMPUTATION, 2024, 16 (06) : 3020 - 3034
  • [9] Code-Switching Language Modeling using Syntax-Aware Multi-Task Learning
    Winata, Genta Indra
    Madotto, Andrea
    Wu, Chien-Sheng
    Fung, Pascale
    COMPUTATIONAL APPROACHES TO LINGUISTIC CODE-SWITCHING, 2018, : 62 - 67
  • [10] Metric-Guided Multi-task Learning
    Ren, Jinfu
    Liu, Yang
    Liu, Jiming
    FOUNDATIONS OF INTELLIGENT SYSTEMS (ISMIS 2020), 2020, 12117 : 21 - 31