Snippet Comment Generation Based on Code Context Expansion

被引:1
|
作者
Guo, Hanyang [1 ,2 ]
Chen, Xiangping [3 ]
Huang, Yuan [4 ]
Wang, Yanlin [4 ]
Ding, Xi [5 ]
Zheng, Zibin [4 ]
Zhou, Xiaocong [5 ]
Dai, Hong-Ning [2 ]
机构
[1] Sun Yat Sen Univ, Sch Software Engn, Guangzhou, Peoples R China
[2] Hong Kong Baptist Univ, Dept Comp Sci, Hong Kong, Peoples R China
[3] Sun Yat Sen Univ, Sch Commun & Design, Guangdong Key Lab Big Data Anal & Simulat Publ Op, Guangzhou, Peoples R China
[4] Sun Yat Sen Univ, Sch Software Engn, Zhuhai, Peoples R China
[5] Sun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
Snippet comment generation; code summarization; neural machine translation; contextual information;
D O I
10.1145/3611664
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Code commenting plays an important role in program comprehension. Automatic comment generation helps improve software maintenance efficiency. The code comments to annotate a method mainly include header comments and snippet comments. The header comment aims to describe the functionality of the entire method, thereby providing a general comment at the beginning of the method. The snippet comment appears at multiple code segments in the body of a method, where a code segment is called a code snippet. Both of them help developers quickly understand code semantics, thereby improving code readability and code maintainability. However, existing automatic comment generation models mainly focus more on header comments, because there are public datasets to validate the performance. By contrast, it is challenging to collect datasets for snippet comments, because it is difficult to determine their scope. Even worse, code snippets are often too short to capture complete syntax and semantic information. To address this challenge, we propose a novel Snippet Comment Generation approach called SCGen. First, we utilize the context of the code snippet to expand the syntax and semantic information. Specifically, 600,243 snippet code-comment pairs are collected from 959 Java projects. Then, we capture variables from code snippets and extract variable-related statements from the context. After that, we devise an algorithm to parse and traverse abstract syntax tree (AST) information of code snippets and corresponding context. Finally, SCGen generates snippet comments after inputting the source code snippet and corresponding AST information into a sequence-to-sequence-based model. We conducted extensive experiments on the dataset we collected to evaluate our SCGen. Our approach obtains 18.23 in BLEU-4 metrics, 18.83 in METEOR, and 23.65 in ROUGE-L, which outperforms state-of-the-art comment generation models.
引用
收藏
页数:30
相关论文
共 50 条
  • [1] Towards Context-Aware Code Comment Generation
    Yu, Xiaohan
    Huang, Quzhe
    Wang, Zheng
    Feng, Yansong
    Zhao, Dongyan
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 3938 - 3947
  • [2] How Do Developers Adapt Code Snippets to Their Contexts? An Empirical Study of Context-Based Code Snippet Adaptations
    Zhang, Tanghaoran
    Lu, Yao
    Yu, Yue
    Mao, Xinjun
    Zhang, Yang
    Zhao, Yuxin
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2024, 50 (11) : 2712 - 2731
  • [3] SnipMatch: Using Source Code Context to Enhance Snippet Retrieval and Parameterization
    Wightman, Doug
    Ye, Zi
    Brandt, Joel
    Vertegaal, Roel
    UIST'12: PROCEEDINGS OF THE 25TH ANNUAL ACM SYMPOSIUM ON USER INTERFACE SOFTWARE AND TECHNOLOGY, 2012, : 219 - 228
  • [4] Comparative Snippet Generation
    Jain, Saurabh
    Miao, Yisong
    Kan, Min-Yen
    PROCEEDINGS OF THE 5TH WORKSHOP ON E-COMMERCE AND NLP (ECNLP 5), 2022, : 49 - 57
  • [5] Efficient Index-Based Snippet Generation
    Bast, Hannah
    Celikik, Marjan
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2014, 32 (02)
  • [6] Pseudo-relevance feedback and statistical query expansion for web snippet generation
    Ko, Youngjoong
    An, Hongkuk
    Seo, Jungyun
    INFORMATION PROCESSING LETTERS, 2008, 109 (01) : 18 - 22
  • [7] Abstractive Snippet Generation
    Chen, Wei-Fan
    Syed, Shahbaz
    Stein, Benno
    Hagen, Matthias
    Potthast, Martin
    WEB CONFERENCE 2020: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2020), 2020, : 1309 - 1319
  • [8] SeTransformer: A Transformer-Based Code Semantic Parser for Code Comment Generation
    Li, Zheng
    Wu, Yonghao
    Peng, Bin
    Chen, Xiang
    Sun, Zeyu
    Liu, Yong
    Paul, Doyle
    IEEE TRANSACTIONS ON RELIABILITY, 2023, 72 (01) : 258 - 273
  • [9] Deep Code Comment Generation
    Hu, Xing
    Li, Ge
    Xia, Xin
    Lo, David
    Jin, Zhi
    2018 IEEE/ACM 26TH INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION (ICPC 2018), 2018, : 200 - 210
  • [10] A Simple Retrieval-based Method for Code Comment Generation
    Zhu, Xiaoning
    Sha, Chaofeng
    Niu, Junyu
    Proceedings - 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2022, 2022, : 1089 - 1100