Leveraging Code Generation to Improve Code Retrieval and Summarization via Dual Learning

被引:49
|
作者
Ye, Wei [1 ]
Xie, Rui [1 ,2 ]
Zhang, Jinglei [1 ]
Hu, Tianxiang [1 ,2 ]
Wang, Xiaoyin [3 ]
Zhang, Shikun [1 ]
机构
[1] Peking Univ, Natl Engn Res Ctr Software Engn, Beijing, Peoples R China
[2] Peking Univ, Sch Software & Microelect, Beijing, Peoples R China
[3] Univ Texas San Antonio, Dept Comp Sci, San Antonio, TX USA
关键词
code retrieval; code summarization; code generation; dual learning;
D O I
10.1145/3366423.3380295
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Code summarization generates brief natural language description given a source code snippet, while code retrieval fetches relevant source code given a natural language query. Since both tasks aim to model the association between natural language and programming language, recent studies have combined these two tasks to improve their performance. However, researchers have yet been able to effectively leverage the intrinsic connection between the two tasks as they train these tasks in a separate or pipeline manner, which means their performance can not be well balanced. In this paper, we propose a novel end-to-end model for the two tasks by introducing an additional code generation task. More specifically, we explicitly exploit the probabilistic correlation between code summarization and code generation with dual learning, and utilize the two encoders for code summarization and code generation to train the code retrieval task via multi-task learning. We have carried out extensive experiments on an existing dataset of SQL and Python, and results show that our model can significantly improve the results of the code retrieval task over the-state-of-art models, as well as achieve competitive performance in terms of BLEU score for the code summarization task.
引用
收藏
页码:2309 / 2319
页数:11
相关论文
共 50 条
  • [1] Leveraging Comment Retrieval for Code Summarization
    Hou, Shifu
    Chen, Lingwei
    Ju, Mingxuan
    Ye, Yanfang
    [J]. ADVANCES IN INFORMATION RETRIEVAL, ECIR 2023, PT II, 2023, 13981 : 439 - 447
  • [2] Code Generation as a Dual Task of Code Summarization
    Wei, Bolin
    Li, Ge
    Xia, Xin
    Fu, Zhiyi
    Jin, Zhi
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [3] Retrieval Augmented Code Generation and Summarization
    Parvez, Md Rizwan
    Ahmad, Wasi Uddin
    Chakraborty, Saikat
    Ray, Baishakhi
    Chang, Kai-Wei
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 2719 - 2734
  • [4] Cross-Modal Retrieval-enhanced code Summarization based on joint learning for retrieval and generation
    Li, Lixuan
    Liang, Bin
    Chen, Lin
    Zhang, Xiaofang
    [J]. INFORMATION AND SOFTWARE TECHNOLOGY, 2024, 175
  • [5] Revisiting Information Retrieval and Deep Learning Approaches for Code Summarization
    Zhu, Tingwei
    Li, Zhong
    Pan, Minxue
    Shi, Chaoxuan
    Zhang, Tian
    Pei, Yu
    Li, Xuandong
    [J]. 2023 IEEE/ACM 45TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: COMPANION PROCEEDINGS, ICSE-COMPANION, 2023, : 328 - 329
  • [6] Automatic Documentation Generation via Source Code Summarization
    McBurney, Paul W.
    [J]. 2015 IEEE/ACM 37TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, VOL 2, 2015, : 903 - 906
  • [7] CoSS: Leveraging Statement Semantics for Code Summarization
    Shi, Chaochen
    Cai, Borui
    Zhao, Yao
    Gao, Longxiang
    Sood, Keshav
    Xiang, Yong
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2023, 49 (06) : 3472 - 3486
  • [8] Learning a holistic and comprehensive code representation for code summarization
    Yang, Kaiyuan
    Wang, Junfeng
    Song, Zihua
    [J]. JOURNAL OF SYSTEMS AND SOFTWARE, 2023, 203
  • [9] Self-Supervised Contrastive Learning for Code Retrieval and Summarization via Semantic-Preserving Transformations
    Bui, Nghi D. Q.
    Yu, Yijun
    Jiang, Lingxiao
    [J]. SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 511 - 521
  • [10] A Neural Framework for Retrieval and Summarization of Source Code
    Chen, Qingying
    Zhou, Minghui
    [J]. PROCEEDINGS OF THE 2018 33RD IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMTED SOFTWARE ENGINEERING (ASE' 18), 2018, : 826 - 831