Esale: <underline>E</underline>nhancing Code-<underline>S</underline>ummary <underline>A</underline>lignment <underline>Le</underline>arning for Source Code Summarization

被引：0

作者：

Fang, Chunrong ^{[1
,2
]}

Sun, Weisong ^{[3
]}

Chen, Yuchen ^{[1
,2
]}

Chen, Xiao ^{[1
,2
]}

Wei, Zhao ^{[4
]}

Zhang, Quanjun ^{[1
,2
]}

You, Yudu ^{[1
,2
]}

Luo, Bin ^{[1
,2
]}

Liu, Yang ^{[3
]}

Chen, Zhenyu ^{[1
,2
]}

机构：

[1] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing 210093, Peoples R China

[2] Nanjing Univ, Software Inst, Nanjing 210008, Jiangsu, Peoples R China

[3] Nanyang Technol Univ, Coll Comp & Data Sci, Singapore 639798, Singapore

[4] Tencent Inc, Shenzhen 518057, Peoples R China

来源：

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING | 2024年 / 50卷 / 08期

基金：

新加坡国家研究基金会; 中国国家自然科学基金;

关键词：

Source code summarization; deep learning; multi-task learning; COMPREHENSION;

D O I：

10.1109/TSE.2024.3422274

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

(Source) code summarization aims to automatically generate succinct natural language summaries for given code snippets. Such summaries play a significant role in promoting developers to understand and maintain code. Inspired by neural machine translation, deep learning-based code summarization techniques widely adopt an encoder-decoder framework, where the encoder transforms given code snippets into context vectors, and the decoder decodes context vectors into summaries. Recently, large-scale pre-trained models for source code (e.g., CodeBERT and UniXcoder) are equipped with encoders capable of producing general context vectors and have achieved substantial improvements on the code summarization task. However, although they are usually trained mainly on code-focused tasks and can capture general code features, they still fall short in capturing specific features that need to be summarized. In a nutshell, they fail to learn the alignment between code snippets and summaries (code-summary alignment for short). In this paper, we propose a novel approach to improve code summarization based on summary-focused tasks. Specifically, we exploit a multi-task learning paradigm to train the encoder on three summary-focused tasks to enhance its ability to learn code-summary alignment, including unidirectional language modeling (ULM), masked language modeling (MLM), and action word prediction (AWP). Unlike pre-trained models that mainly predict masked tokens in code snippets, we design ULM and MLM to predict masked words in summaries. Intuitively, predicting words based on given code snippets would help learn the code-summary alignment. In addition, existing work shows that AWP affects the prediction of the entire summary. Therefore, we further introduce the domain-specific task AWP to enhance the ability of the encoder to learn the alignment between action words and code snippets. We evaluate the effectiveness of our approach, called Esale, by conducting extensive experiments on four datasets, including two widely used datasets JCSD and PCSD, a cross-project Java dataset CPJD, and a multilingual language dataset CodeSearchNet. Experimental results show that Esale significantly outperforms state-of-the-art baselines in all three widely used metrics, including BLEU, METEOR, and ROUGE-L. Moreover, the human evaluation proves that the summaries generated by Esale are more informative and closer to the ground-truth summaries.

引用

页码：2077 / 2095

页数：19

共 50 条

[1] SIMPNet: <underline>S</underline>patial-<underline>I</underline>nformed <underline>M</underline>otion <underline>P</underline>lanning <underline>Net</underline>work
Soleymanzadeh, Davood
Liang, Xiao
Zheng, Minghui
IEEE ROBOTICS AND AUTOMATION LETTERS, 2025, 10 (03): : 2870 - 2877
[2] LITE-SNN: <underline>L</underline>everaging <underline>I</underline>nherent Dynamics to <underline>T</underline>rain <underline>E</underline>nergy-Efficient <underline>S</underline>piking <underline>N</underline>eural <underline>N</underline>etworks for Sequential Learning
Rathi, Nitin
Roy, Kaushik
IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2024, 16 (06) : 1905 - 1914
[3] dbAPIS: a database of <underline>a</underline>nti-<underline>p</underline>rokaryotic <underline>i</underline>mmune <underline>s</underline>ystem genes
Yan, Yuchen
Zheng, Jinfang
Zhang, Xinpeng
Yin, Yanbin
NUCLEIC ACIDS RESEARCH, 2023, 52 (D1) : D419 - D425
[4] HGNAS: <underline>H</underline>ardware-Aware <underline>G</underline>raph <underline>N</underline>eural <underline>A</underline>rchitecture <underline>S</underline>earch for Edge Devices
Zhou, Ao
Yang, Jianlei
Qi, Yingjie
Qiao, Tong
Shi, Yumeng
Duan, Cenlin
Zhao, Weisheng
Hu, Chunming
IEEE TRANSACTIONS ON COMPUTERS, 2024, 73 (12) : 2693 - 2707
[5] AUDIT: Function<underline>a</underline>l Q<underline>u</underline>alification in A<underline>d</underline>ditive Manufacturing Via Physical and Dig<underline>i</underline>tal <underline>T</underline>wins
Biehler, Michael
Mock, Reinaldo
Kode, Shriyanshu
Mehmood, Maham
Bhardwaj, Palin
Shi, Jianjun
JOURNAL OF MANUFACTURING SCIENCE AND ENGINEERING-TRANSACTIONS OF THE ASME, 2024, 146 (02):
[6] Patient-<underline>Selection</underline> of a Clinical Trial Primary <underline>Outcome</underline>: The ENHANCE-AF <underline>Outcomes</underline> <underline>Survey</underline>
Stafford, Randall S.
Rice, Eli N.
Shah, Rushil
Hills, Mellanie T.
Nunes, Julio C.
Desutter, Katie
Lin, Amy
Lhamo, Karma
Lin, Bryant
Lu, Ying
Wang, Paul J.
PLOS ONE, 2025, 20 (03):
[7] HEARTS Study Protocol: <underline>H</underline>elping <underline>E</underline>nable <underline>A</underline>ccess and <underline>R</underline>emove Barriers <underline>T</underline>o <underline>S</underline>upport for Young Adults with Mental Health-Related Disabilities
Rao, Sandy
Dimitropoulos, Gina
Milaney, Katrina
Eurich, Dean T.
Patten, Scott B.
YOUTH, 2024, 4 (01): : 107 - 123
[8] ViTeGNN: Towards <underline>V</underline>ersatile <underline>I</underline>nference of <underline>Te</underline>mporal <underline>G</underline>raph <underline>N</underline>eural <underline>N</underline>etworks on FPGA
Zhou, Hongkuan
Zhang, Bingyi
Kannan, Rajgopal
Busart, Carl
Prasanna, Viktor K.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2025, 36 (03) : 502 - 519
[9] Study protocol of the <underline>PE</underline>ruvian <underline>R</underline>egistry of <underline>ST</underline>-segment <underline>E</underline>levation <underline>M</underline>yocardial <underline>I</underline>nfarction II (PERSTEMI-II) study
Chacon-Diaz, Manuel
Hernandez-Vasquez, Akram
Vargas-Fernandez, Rodrigo
Bendezu-Quispe, Guido
PLOS ONE, 2021, 16 (09):
[10] A randomized multicenter trial of a chronic disease management intervention for decompensated cirrhosis. The <underline>A</underline>ustra<underline>l</underline>ian <underline>L</underline>iver <underline>F</underline>a<underline>i</underline>lur<underline>e</underline> (ALFIE) trial
Wigg, Alan J.
Narayana, Sumudu
Woodman, Richard J.
Adams, Leon A.
Wundke, Rachel
Chinnaratha, Mohamed A.
Jeffrey, Gary
Plummer, Joan-Lee
Sheehan, Vanessa
Tse, Edmund
Morgan, Joanne
Huynh, Dep
Milner, Margery
Stewart, Jeffrey
Ahlensteil, Golo
Baig, Asma
Kaambwa, Billingsley
Muller, Kate
Ramachandran, Jeyamani
HEPATOLOGY, 2024,

← 1 2 3 4 5 →