CPC: Automatically Classifying and Propagating Natural Language Comments via Program Analysis

被引：30

作者：

Zhai, Juan ^{[1
,2
]}

Xu, Xiangzhe ^{[3
]}

Shi, Yu ^{[1
]}

Tao, Guanhong ^{[1
]}

Pan, Minxue ^{[3
]}

Ma, Shiqing ^{[2
]}

Xu, Lei ^{[3
]}

Zhang, Weifeng ^{[4
]}

Tan, Lin ^{[1
]}

Zhang, Xiangyu ^{[1
]}

机构：

[1] Purdue Univ, W Lafayette, IN 47907 USA

[2] Rutgers State Univ, Piscataway, NJ USA

[3] Nanjing Univ, Nanjing, Peoples R China

[4] Nanjing Univ Posts & Telecommun, Nanjing, Peoples R China

来源：

2020 ACM/IEEE 42ND INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2020) | 2020年

关键词：

D O I：

10.1145/3377811.3380427

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Code comments provide abundant information that have been leveraged to help perform various software engineering tasks, such as bug detection, specification inference, and code synthesis. However, developers are less motivated to write and update comments, making it infeasible and error-prone to leverage comments to facilitate software engineering tasks. In this paper, we propose to leverage program analysis to systematically derive, refine, and propagate comments. For example, by propagation via program analysis, comments can be passed on to code entities that are not commented such that code bugs can be detected leveraging the propagated comments. Developers usually comment on different aspects of code elements like methods, and use comments to describe various contents, such as functionalities and properties. To more effectively utilize comments, a fine-grained and elaborated taxonomy of comments and a reliable classifier to automatically categorize a comment are needed. In this paper, we build a comprehensive taxonomy and propose using program analysis to propagate comments. We develop a prototype CPC, and evaluate it on 5 projects. The evaluation results demonstrate 41573 new comments can be derived by propagation from other code locations with 88% accuracy. Among them, we can derive precise functional comments for 87 native methods that have neither existing comments nor source code. Leveraging the propagated comments, we detect 37 new bugs in open source large projects, 30 of which have been confirmed and fixed by developers, and 304 defects in existing comments (by looking at inconsistencies between existing and propagated comments), including 12 incomplete comments and 292 wrong comments. This demonstrates the effectiveness of our approach. Our user study confirms propagated comments align well with existing comments in terms of quality.

引用

页码：1359 / 1371

页数：13

共 33 条

[1] Classifying Code Comments via Pre-trained Programming Language Model
Li, Ying
Wang, Haibo
Zhang, Huaien
Tan, Shin Hwei
2023 IEEE/ACM 2ND INTERNATIONAL WORKSHOP ON NATURAL LANGUAGE-BASED SOFTWARE ENGINEERING, NLBSE, 2023, : 24 - 27
[2] Introducing Natural Language Program Analysis
Pollock, Lori
Vijay-Shanker, K.
Shepherd, David
Hill, Emily
Fry, Zachary P.
Maloor, Kishen
PASTE'07 PROCEEDINGS OF THE 2007 ACM SIGPLAN- SIGSOFT WORKSHOP ON PROGRAM ANALYSIS FOR SOFTWARE TOOLS & ENGINEERING, 2007, : 15 - 16
[3] Automatically Structuring on Chinese Ultrasound Report of Cerebrovascular Diseases via Natural Language Processing
Chen, Pengyu
Liu, Qiao
Wei, Lan
Zhao, Beier
Jia, Yin
Lv, Hairong
Fei, Xiaolu
IEEE ACCESS, 2019, 7 : 89043 - 89050
[4] Classifying requirements: Towards a more rigorous analysis of natural-language specifications
Nikora, Allen P.
16th IEEE International Symposium on Software Reliability Engineering, Proceedings, 2005, : 291 - 300
[5] C2S: Translating Natural Language Comments to Formal Program Specifications
Zhai, Juan
Shi, Yu
Pan, Minxue
Zhou, Guian
Liu, Yongxiang
Fang, Chunrong
Ma, Shiqing
Tan, Lin
Zhang, Xiangyu
PROCEEDINGS OF THE 28TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (ESEC/FSE '20), 2020, : 25 - 37
[6] AUTOMATICALLY BUILDING A KNOWLEDGE-BASE THROUGH NATURAL-LANGUAGE TEXT ANALYSIS
HODGES, JE
CORDOVA, JL
INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 1993, 8 (09) : 921 - 938
[7] Robot Program Construction via Grounded Natural Language Semantics & Simulation
Pomarlan, Mihai
Bateman, John
PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS (AAMAS' 18), 2018, : 857 - 864
[8] COUNT - PL/I PROGRAM FOR CONTENT ANALYSIS OF NATURAL LANGUAGE
MARTINDALE, C
BEHAVIORAL SCIENCE, 1973, 18 (02): : 148 - 148
[9] Segmenting Natural Language Sentences via Lexical Unit Analysis
Li, Yangming
Liu, Lemao
Shi, Shuming
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 181 - 187
[10] Novel Natural Language Summarization of Program Code via Leveraging Multiple Input Representations
Chen, Fuxiang
Kim, Mijung
Choo, Jaegul
Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021, 2021, : 2510 - 2520

← 1 2 3 4 →