Project-Level Encoding for Neural Source Code Summarization of Subroutines

被引:19
|
作者
Bansal, Aakash [1 ]
Haque, Sakib [1 ]
McMillan, Collin [1 ]
机构
[1] Univ Notre Dame, Dept Comp Sci & Engn, Notre Dame, IN 46556 USA
关键词
source code summarization; automatic documentation generation; neural networks; PROGRAM COMPREHENSION;
D O I
10.1109/ICPC52881.2021.00032
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Source code summarization of a subroutine is the task of writing a short, natural language description of that subroutine. The description usually serves in documentation aimed at programmers, where even brief phrase (e.g. "compresses data to a zip file") can help readers rapidly comprehend what a subroutine does without resorting to reading the code itself. Techniques based on neural networks (and encoder-decoder model designs in particular) have established themselves as the state-of-the-art. Yet a problem widely recognized with these models is that they assume the information needed to create a summary is present within the code being summarized itself - an assumption which is at odds with program comprehension literature. Thus a current research frontier lies in the question of encoding source code context into neural models of summarization. In this paper, we present a project-level encoder to improve models of code summarization. By project-level, we mean that we create a vectorized representation of selected code files in a software project, and use that representation to augment the encoder of state-of-the-art neural code summarization techniques. We demonstrate how our encoder improves several existing models, and provide guidelines for maximizing improvement while controlling time and resource costs in model size.
引用
收藏
页码:253 / 264
页数:12
相关论文
共 50 条
  • [1] Ensemble models for neural source code summarization of subroutines
    LeClair, Alexander
    Bansal, Aakash
    McMillan, Collin
    arXiv, 2021,
  • [2] Ensemble Models for Neural Source Code Summarization of Subroutines
    LeClair, Alexander
    Bansal, Aakash
    McMillan, Collin
    2021 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME 2021), 2021, : 286 - 297
  • [3] Function Call Graph Context Encoding for Neural Source Code Summarization
    Bansal, Aakash
    Eberhart, Zachary
    Karas, Zachary
    Huang, Yu
    Mcmillan, Collin
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2023, 49 (09) : 4268 - 4281
  • [4] A Neural Framework for Retrieval and Summarization of Source Code
    Chen, Qingying
    Zhou, Minghui
    PROCEEDINGS OF THE 2018 33RD IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMTED SOFTWARE ENGINEERING (ASE' 18), 2018, : 826 - 831
  • [5] Semantic similarity loss for neural source code summarization
    Su, Chia-Yi
    McMillan, Collin
    JOURNAL OF SOFTWARE-EVOLUTION AND PROCESS, 2024,
  • [6] Retrieval-based Neural Source Code Summarization
    Zhang, Jian
    Wang, Xu
    Zhang, Hongyu
    Sun, Hailong
    Liu, Xudong
    2020 ACM/IEEE 42ND INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2020), 2020, : 1385 - 1397
  • [7] Label Smoothing Improves Neural Source Code Summarization
    Haque, Sakib
    Bansal, Aakash
    McMillan, Collin
    2023 IEEE/ACM 31ST INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION, ICPC, 2023, : 101 - 112
  • [8] Action Word Prediction for Neural Source Code Summarization
    Haque, Sakib
    Bansal, Aakash
    Wu, Lingfei
    McMillan, Collin
    2021 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING (SANER 2021), 2021, : 330 - 341
  • [9] Action word prediction for neural source code summarization
    Haque, Sakib
    Bansal, Aakash
    Wu, Lingfei
    McMillan, Collin
    arXiv, 2021,
  • [10] Bi-LSTM-Based Neural Source Code Summarization
    Aljumah, Sarah
    Berriche, Lamia
    APPLIED SCIENCES-BASEL, 2022, 12 (24):