A syntax-guided multi-task learning approach for Turducken-style code generation

被引：2

作者：

Yang, Guang ^{[1
]}

Zhou, Yu ^{[1
]}

Chen, Xiang ^{[2
]}

Zhang, Xiangyu ^{[1
]}

Xu, Yiran ^{[1
]}

Han, Tingting ^{[3
]}

Chen, Taolue ^{[3
]}

机构：

[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing, Peoples R China

[2] Nantong Univ, Sch Informat Sci & Technol, Nantong, Peoples R China

[3] Birkbeck Univ London, Dept Comp Sci, London, England

来源：

EMPIRICAL SOFTWARE ENGINEERING | 2023年 / 28卷 / 06期

基金：

中国国家自然科学基金;

关键词：

Syntactically-constrained code generation; Turducken-style code; Multi-task learning; CodeT5; Abstract syntax tree;

D O I：

10.1007/s10664-023-10372-1

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Due to the development of pre-trained language models, automated code generation techniques have shown great promise in recent years. However, the generated code will not always adhere to syntactic constraints of the target language, especially in the case of Turducken-style code, where declarative code snippets are embedded within imperative programs. In this study, we summarize three significant challenges in regards to syntactic constraints: (1) the efficient representation of syntactic constraints, (2) the effective integration of syntactic information, and (3) the scalable syntax-first decoding algorithm. To address these challenges, we propose a syntax-guided multi-task learning approach TurduckenGen. Specifically, we first explicitly append the type information to the code tokens to capture the representation of syntactic constraints. Then we formalize code generation with syntactic constraint representation as an auxiliary task to enable the model to learn the syntactic constraints of the code. Finally, the syntactically correct code is selected accurately from the multiple candidates with the help of the compiler feedback. Extensive experiments and comprehensive analysis demonstrate the effectiveness and general applicability of our approach after being compared with six state-of-the-art baselines on two Turducken-style code datasets. Finally, we conducted a human study and found the code quality generated by our approach is better than baselines in terms of code readability and semantic similarity.

引用

页数：35

共 50 条

[1] A syntax-guided multi-task learning approach for Turducken-style code generation
Guang Yang
Yu Zhou
Xiang Chen
Xiangyu Zhang
Yiran Xu
Tingting Han
Taolue Chen
Empirical Software Engineering, 2023, 28
[2] An Adversarial Approach for Unsupervised Syntax-Guided Paraphrase Generation
Xue, Tang
Zhao, Yuran
Liu, Gongshen
Li, Xiaoyong
NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2022, PT I, 2022, 13551 : 364 - 376
[3] Syntax-guided question generation using prompt learning
Hou, Zheheng
Bi, Sheng
Qi, Guilin
Zheng, Yuanchun
Ren, Zuomin
Li, Yun
NEURAL COMPUTING & APPLICATIONS, 2024, 36 (12): : 6271 - 6282
[4] Syntax-guided question generation using prompt learning
Zheheng Hou
Sheng Bi
Guilin Qi
Yuanchun Zheng
Zuomin Ren
Yun Li
Neural Computing and Applications, 2024, 36 : 6271 - 6282
[5] Reinforcement Learning and Data-Generation for Syntax-Guided Synthesis
Parsert, Julian
Polgreen, Elizabeth
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 9, 2024, : 10670 - 10678
[6] Syntax-controlled paraphrases generation with VAE and multi-task learning
Jia, Xiyuan
Mao, Zongqing
Zhang, Zhen
Lv, Qiyun
Wang, Xin
Wu, Guohua
COMPUTER SPEECH AND LANGUAGE, 2025, 89
[7] MulCode: A Multi-task Learning Approach for Source Code Understanding
Wang, Deze
Yu, Yue
Li, Shanshan
Dong, Wei
Wang, Ji
Qing, Liao
2021 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING (SANER 2021), 2021, : 48 - 59
[8] Granular Syntax Processing with Multi-Task and Curriculum Learning
Zhang, Xulang
Mao, Rui
Cambria, Erik
COGNITIVE COMPUTATION, 2024, 16 (06) : 3020 - 3034
[9] Code-Switching Language Modeling using Syntax-Aware Multi-Task Learning
Winata, Genta Indra
Madotto, Andrea
Wu, Chien-Sheng
Fung, Pascale
COMPUTATIONAL APPROACHES TO LINGUISTIC CODE-SWITCHING, 2018, : 62 - 67
[10] Metric-Guided Multi-task Learning
Ren, Jinfu
Liu, Yang
Liu, Jiming
FOUNDATIONS OF INTELLIGENT SYSTEMS (ISMIS 2020), 2020, 12117 : 21 - 31

← 1 2 3 4 5 →