Syntax-Aware Retrieval Augmented Code Generation

被引:0
|
作者
Zhang, Xiangyu [1 ]
Zhou, Yu [1 ]
Yang, Guang [1 ]
Chen, Taolue [2 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Nanjing, Peoples R China
[2] Birkbeck Univ London, London, England
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Neural code generation models are nowadays widely adopted to generate code from natural language descriptions automatically. Recently, pre-trained neural models equipped with token-level retrieval capabilities have exhibited great potentials in neural machine translation. However, applying them directly to code generation experience challenges: the use of the retrieval-based mechanism inevitably introduces extraneous noise to the generation process, resulting in even syntactically incorrect code. Computationally, such models necessitate frequent searches of the cached datastore, which turns out to be time-consuming. To address these issues, we propose kNN-TRANX, a token-level retrieval augmented code generation method. kNN-TRANX allows for searches in smaller datastores tailored for the code generation task. It leverages syntax constraints for the retrieval of datastores, which reduces the impact of retrieve noise. We evaluate kNN-TRANX on two public datasets and the experimental results confirm the effectiveness of our approach.
引用
收藏
页码:1291 / 1302
页数:12
相关论文
共 50 条
  • [1] Syntax-aware on-the-fly code completion
    Takerngsaksiri, Wannita
    Tantithamthavorn, Chakkrit
    Li, Yuan-Fang
    INFORMATION AND SOFTWARE TECHNOLOGY, 2024, 165
  • [2] A Syntax-Aware Re-ranker for Microblog Retrieval
    Severyn, Aliaksei
    Moschitti, Alessandro
    Tsagkias, Manos
    Berendsen, Richard
    de Rijke, Maarten
    SIGIR'14: PROCEEDINGS OF THE 37TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2014, : 1067 - 1070
  • [3] srcQL: A Syntax-Aware Query Language for Source Code
    Bartman, Brian
    Newman, Christian D.
    Collard, Michael L.
    Maletic, Jonathan I.
    2017 IEEE 24TH INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION, AND REENGINEERING (SANER), 2017, : 467 - 471
  • [4] A novel syntax-aware automatic graphics code generation with attention-based deep neural network
    Pang, Xiongwen
    Zhou, Yanqiang
    Li, Pengcheng
    Lin, Weiwei
    Wu, Wentai
    Wang, James Z.
    JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2020, 161
  • [5] Syntax-Aware Multi-Spans Generation for Reading Comprehension
    Zhang, Zhuosheng
    Zhang, Yiqing
    Zhao, Hai
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 260 - 268
  • [6] TreeGAN: Syntax-aware Sequence Generation with Generative Adversarial Networks
    Liu, Xinyue
    Kong, Xiangnan
    Liu, Lei
    Chiang, Kuorong
    2018 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2018, : 1140 - 1145
  • [7] A Syntax-Aware Encoder for Authorship Attribution
    Liu, Jianbo
    Hu, Zhiqiang
    Zhang, Jiasheng
    Lee, Roy Ka-Wei
    Shao, Jie
    WEB INFORMATION SYSTEMS ENGINEERING - WISE 2021, PT I, 2021, 13080 : 403 - 411
  • [8] Towards syntax-aware token embeddings
    Popa, Diana Nicoleta
    Perez, Julien
    Henderson, James
    Gaussier, Eric
    NATURAL LANGUAGE ENGINEERING, 2021, 27 (06) : 691 - 720
  • [9] Syntax-Aware Representation for Aspect Term Extraction
    Zhang, Jingyuan
    Xu, Guangluan
    Wang, Xinyi
    Sun, Xian
    Huang, Tinglei
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2019, PT I, 2019, 11439 : 123 - 134
  • [10] Syntax-Aware Mutation for Testing the Solidity Compiler
    Mitropoulos, Charalambos
    Sotiropoulos, Thodoris
    Ioannidis, Sotiris
    Mitropoulos, Dimitris
    COMPUTER SECURITY - ESORICS 2023, PT III, 2024, 14346 : 327 - 347