Esale: <underline>E</underline>nhancing Code-<underline>S</underline>ummary <underline>A</underline>lignment <underline>Le</underline>arning for Source Code Summarization

被引:0
|
作者
Fang, Chunrong [1 ,2 ]
Sun, Weisong [3 ]
Chen, Yuchen [1 ,2 ]
Chen, Xiao [1 ,2 ]
Wei, Zhao [4 ]
Zhang, Quanjun [1 ,2 ]
You, Yudu [1 ,2 ]
Luo, Bin [1 ,2 ]
Liu, Yang [3 ]
Chen, Zhenyu [1 ,2 ]
机构
[1] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing 210093, Peoples R China
[2] Nanjing Univ, Software Inst, Nanjing 210008, Jiangsu, Peoples R China
[3] Nanyang Technol Univ, Coll Comp & Data Sci, Singapore 639798, Singapore
[4] Tencent Inc, Shenzhen 518057, Peoples R China
基金
新加坡国家研究基金会; 中国国家自然科学基金;
关键词
Source code summarization; deep learning; multi-task learning; COMPREHENSION;
D O I
10.1109/TSE.2024.3422274
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
(Source) code summarization aims to automatically generate succinct natural language summaries for given code snippets. Such summaries play a significant role in promoting developers to understand and maintain code. Inspired by neural machine translation, deep learning-based code summarization techniques widely adopt an encoder-decoder framework, where the encoder transforms given code snippets into context vectors, and the decoder decodes context vectors into summaries. Recently, large-scale pre-trained models for source code (e.g., CodeBERT and UniXcoder) are equipped with encoders capable of producing general context vectors and have achieved substantial improvements on the code summarization task. However, although they are usually trained mainly on code-focused tasks and can capture general code features, they still fall short in capturing specific features that need to be summarized. In a nutshell, they fail to learn the alignment between code snippets and summaries (code-summary alignment for short). In this paper, we propose a novel approach to improve code summarization based on summary-focused tasks. Specifically, we exploit a multi-task learning paradigm to train the encoder on three summary-focused tasks to enhance its ability to learn code-summary alignment, including unidirectional language modeling (ULM), masked language modeling (MLM), and action word prediction (AWP). Unlike pre-trained models that mainly predict masked tokens in code snippets, we design ULM and MLM to predict masked words in summaries. Intuitively, predicting words based on given code snippets would help learn the code-summary alignment. In addition, existing work shows that AWP affects the prediction of the entire summary. Therefore, we further introduce the domain-specific task AWP to enhance the ability of the encoder to learn the alignment between action words and code snippets. We evaluate the effectiveness of our approach, called Esale, by conducting extensive experiments on four datasets, including two widely used datasets JCSD and PCSD, a cross-project Java dataset CPJD, and a multilingual language dataset CodeSearchNet. Experimental results show that Esale significantly outperforms state-of-the-art baselines in all three widely used metrics, including BLEU, METEOR, and ROUGE-L. Moreover, the human evaluation proves that the summaries generated by Esale are more informative and closer to the ground-truth summaries.
引用
收藏
页码:2077 / 2095
页数:19
相关论文
共 50 条
  • [1] SIMPNet: <underline>S</underline>patial-<underline>I</underline>nformed <underline>M</underline>otion <underline>P</underline>lanning <underline>Net</underline>work
    Soleymanzadeh, Davood
    Liang, Xiao
    Zheng, Minghui
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2025, 10 (03): : 2870 - 2877
  • [2] LITE-SNN: <underline>L</underline>everaging <underline>I</underline>nherent Dynamics to <underline>T</underline>rain <underline>E</underline>nergy-Efficient <underline>S</underline>piking <underline>N</underline>eural <underline>N</underline>etworks for Sequential Learning
    Rathi, Nitin
    Roy, Kaushik
    IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2024, 16 (06) : 1905 - 1914
  • [3] dbAPIS: a database of <underline>a</underline>nti-<underline>p</underline>rokaryotic <underline>i</underline>mmune <underline>s</underline>ystem genes
    Yan, Yuchen
    Zheng, Jinfang
    Zhang, Xinpeng
    Yin, Yanbin
    NUCLEIC ACIDS RESEARCH, 2023, 52 (D1) : D419 - D425
  • [4] HGNAS: <underline>H</underline>ardware-Aware <underline>G</underline>raph <underline>N</underline>eural <underline>A</underline>rchitecture <underline>S</underline>earch for Edge Devices
    Zhou, Ao
    Yang, Jianlei
    Qi, Yingjie
    Qiao, Tong
    Shi, Yumeng
    Duan, Cenlin
    Zhao, Weisheng
    Hu, Chunming
    IEEE TRANSACTIONS ON COMPUTERS, 2024, 73 (12) : 2693 - 2707
  • [5] AUDIT: Function<underline>a</underline>l Q<underline>u</underline>alification in A<underline>d</underline>ditive Manufacturing Via Physical and Dig<underline>i</underline>tal <underline>T</underline>wins
    Biehler, Michael
    Mock, Reinaldo
    Kode, Shriyanshu
    Mehmood, Maham
    Bhardwaj, Palin
    Shi, Jianjun
    JOURNAL OF MANUFACTURING SCIENCE AND ENGINEERING-TRANSACTIONS OF THE ASME, 2024, 146 (02):
  • [6] Patient-<underline>Selection</underline> of a Clinical Trial Primary <underline>Outcome</underline>: The ENHANCE-AF <underline>Outcomes</underline> <underline>Survey</underline>
    Stafford, Randall S.
    Rice, Eli N.
    Shah, Rushil
    Hills, Mellanie T.
    Nunes, Julio C.
    Desutter, Katie
    Lin, Amy
    Lhamo, Karma
    Lin, Bryant
    Lu, Ying
    Wang, Paul J.
    PLOS ONE, 2025, 20 (03):
  • [7] HEARTS Study Protocol: <underline>H</underline>elping <underline>E</underline>nable <underline>A</underline>ccess and <underline>R</underline>emove Barriers <underline>T</underline>o <underline>S</underline>upport for Young Adults with Mental Health-Related Disabilities
    Rao, Sandy
    Dimitropoulos, Gina
    Milaney, Katrina
    Eurich, Dean T.
    Patten, Scott B.
    YOUTH, 2024, 4 (01): : 107 - 123
  • [8] ViTeGNN: Towards <underline>V</underline>ersatile <underline>I</underline>nference of <underline>Te</underline>mporal <underline>G</underline>raph <underline>N</underline>eural <underline>N</underline>etworks on FPGA
    Zhou, Hongkuan
    Zhang, Bingyi
    Kannan, Rajgopal
    Busart, Carl
    Prasanna, Viktor K.
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2025, 36 (03) : 502 - 519
  • [9] Study protocol of the <underline>PE</underline>ruvian <underline>R</underline>egistry of <underline>ST</underline>-segment <underline>E</underline>levation <underline>M</underline>yocardial <underline>I</underline>nfarction II (PERSTEMI-II) study
    Chacon-Diaz, Manuel
    Hernandez-Vasquez, Akram
    Vargas-Fernandez, Rodrigo
    Bendezu-Quispe, Guido
    PLOS ONE, 2021, 16 (09):
  • [10] A randomized multicenter trial of a chronic disease management intervention for decompensated cirrhosis. The <underline>A</underline>ustra<underline>l</underline>ian <underline>L</underline>iver <underline>F</underline>a<underline>i</underline>lur<underline>e</underline> (ALFIE) trial
    Wigg, Alan J.
    Narayana, Sumudu
    Woodman, Richard J.
    Adams, Leon A.
    Wundke, Rachel
    Chinnaratha, Mohamed A.
    Jeffrey, Gary
    Plummer, Joan-Lee
    Sheehan, Vanessa
    Tse, Edmund
    Morgan, Joanne
    Huynh, Dep
    Milner, Margery
    Stewart, Jeffrey
    Ahlensteil, Golo
    Baig, Asma
    Kaambwa, Billingsley
    Muller, Kate
    Ramachandran, Jeyamani
    HEPATOLOGY, 2024,