CPC: Automatically Classifying and Propagating Natural Language Comments via Program Analysis

被引：30

作者：

Zhai, Juan ^{[1
,2
]}

Xu, Xiangzhe ^{[3
]}

Shi, Yu ^{[1
]}

Tao, Guanhong ^{[1
]}

Pan, Minxue ^{[3
]}

Ma, Shiqing ^{[2
]}

Xu, Lei ^{[3
]}

Zhang, Weifeng ^{[4
]}

Tan, Lin ^{[1
]}

Zhang, Xiangyu ^{[1
]}

机构：

[1] Purdue Univ, W Lafayette, IN 47907 USA

[2] Rutgers State Univ, Piscataway, NJ USA

[3] Nanjing Univ, Nanjing, Peoples R China

[4] Nanjing Univ Posts & Telecommun, Nanjing, Peoples R China

来源：

2020 ACM/IEEE 42ND INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2020) | 2020年

关键词：

D O I：

10.1145/3377811.3380427

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Code comments provide abundant information that have been leveraged to help perform various software engineering tasks, such as bug detection, specification inference, and code synthesis. However, developers are less motivated to write and update comments, making it infeasible and error-prone to leverage comments to facilitate software engineering tasks. In this paper, we propose to leverage program analysis to systematically derive, refine, and propagate comments. For example, by propagation via program analysis, comments can be passed on to code entities that are not commented such that code bugs can be detected leveraging the propagated comments. Developers usually comment on different aspects of code elements like methods, and use comments to describe various contents, such as functionalities and properties. To more effectively utilize comments, a fine-grained and elaborated taxonomy of comments and a reliable classifier to automatically categorize a comment are needed. In this paper, we build a comprehensive taxonomy and propose using program analysis to propagate comments. We develop a prototype CPC, and evaluate it on 5 projects. The evaluation results demonstrate 41573 new comments can be derived by propagation from other code locations with 88% accuracy. Among them, we can derive precise functional comments for 87 native methods that have neither existing comments nor source code. Leveraging the propagated comments, we detect 37 new bugs in open source large projects, 30 of which have been confirmed and fixed by developers, and 304 defects in existing comments (by looking at inconsistencies between existing and propagated comments), including 12 incomplete comments and 292 wrong comments. This demonstrates the effectiveness of our approach. Our user study confirms propagated comments align well with existing comments in terms of quality.

引用

页码：1359 / 1371

页数：13

共 33 条

[21] Analysis of Tourist Comments Based on Python']Python Natural Language Processing: Take Guilin Lijiang Waterfall Hotel as an example
Sun Wenqiang
2022 5TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND NATURAL LANGUAGE PROCESSING, MLNLP 2022, 2022, : 234 - 238
[22] Natural Language Processing and Sentiment Analysis on Bangla Social Media Comments on Russia-Ukraine War Using Transformers
Hasan, Mahmud
Islam, Labiba
Jahan, Ismat
Meem, Sabrina Mannan
Rahman, Rashedur M.
VIETNAM JOURNAL OF COMPUTER SCIENCE, 2023, 10 (03) : 329 - 356
[23] English Language Learning via YouTube: An NLP-Based Analysis of Users' Comments (vol 12, 24, 2023)
Alawadh, Husam M.
Alabrah, Amerah
Meraj, Talha
Rauf, Hafiz Tayyab
COMPUTERS, 2025, 14 (02)
[24] Treating conduct disorder: An effectiveness and natural language analysis study of a new family-centred intervention program
Stevens, Kimberly A.
Ronan, Kevin
Davies, Gene
PSYCHIATRY RESEARCH, 2017, 251 : 287 - 293
[25] Food for thought: A natural language processing analysis of the 2020 Dietary Guidelines public comments (Vol 114, Pg 713, 2021)
Lindquist, Joseph
Boon, Diana M.
Turner, Dusty
Blankenship, Jeanne
Kyle, Theodore K.
AMERICAN JOURNAL OF CLINICAL NUTRITION, 2023, 117 (05): : 1049 - 1049
[26] Talk2Data: A Natural Language Interface for Exploratory Visual Analysis via Question Decomposition
Guo, Yi
Shi, Danqing
Guo, Mingjuan
Wu, Yanqiu
Cao, Nan
Chen, Qing
ACM TRANSACTIONS ON INTERACTIVE INTELLIGENT SYSTEMS, 2024, 14 (02)
[27] Revolution of Medical Review: The Application of Meta-Analysis and Convolutional Neural Network-Natural Language Processing in Classifying the Literature for Head and Neck Cancer Radiotherapy
Lee, Tsair-Fwu
Chang, Chu-Ho
Shao, Jen-Chung
Liu, Yen-Hsien
Chiu, Chien-Liang
Hsieh, Yang-Wei
Lee, Shen-Hao
Chao, Pei-Ju
Yeh, Shyh-An
CANCER CONTROL, 2024, 31
[28] A perspective on the advancement of natural language processing tasks via topological analysis of complex networks Comment on "Approaching human language with complex networks" by Cong and Liu
Amancio, Diego Raphael
PHYSICS OF LIFE REVIEWS, 2014, 11 (04) : 641 - 643
[29] Automated Identification of Substantial Changes in Construction Projects of Airport Improvement Program: Machine Learning and Natural Language Processing Comparative Analysis
Khalef, Ramy
El-adaway, Islam H.
JOURNAL OF MANAGEMENT IN ENGINEERING, 2021, 37 (06)
[30] Content Analysis of Textbooks via Natural Language Processing: Findings on Gender, Race, and Ethnicity in Texas US History Textbooks
Lucy, Li
Demszky, Dorottya
Bromley, Patricia
Jurafsky, Dan
AERA OPEN, 2020, 6 (03)

← 1 2 3 4 →