CPC: Automatically Classifying and Propagating Natural Language Comments via Program Analysis

被引:30
|
作者
Zhai, Juan [1 ,2 ]
Xu, Xiangzhe [3 ]
Shi, Yu [1 ]
Tao, Guanhong [1 ]
Pan, Minxue [3 ]
Ma, Shiqing [2 ]
Xu, Lei [3 ]
Zhang, Weifeng [4 ]
Tan, Lin [1 ]
Zhang, Xiangyu [1 ]
机构
[1] Purdue Univ, W Lafayette, IN 47907 USA
[2] Rutgers State Univ, Piscataway, NJ USA
[3] Nanjing Univ, Nanjing, Peoples R China
[4] Nanjing Univ Posts & Telecommun, Nanjing, Peoples R China
关键词
D O I
10.1145/3377811.3380427
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Code comments provide abundant information that have been leveraged to help perform various software engineering tasks, such as bug detection, specification inference, and code synthesis. However, developers are less motivated to write and update comments, making it infeasible and error-prone to leverage comments to facilitate software engineering tasks. In this paper, we propose to leverage program analysis to systematically derive, refine, and propagate comments. For example, by propagation via program analysis, comments can be passed on to code entities that are not commented such that code bugs can be detected leveraging the propagated comments. Developers usually comment on different aspects of code elements like methods, and use comments to describe various contents, such as functionalities and properties. To more effectively utilize comments, a fine-grained and elaborated taxonomy of comments and a reliable classifier to automatically categorize a comment are needed. In this paper, we build a comprehensive taxonomy and propose using program analysis to propagate comments. We develop a prototype CPC, and evaluate it on 5 projects. The evaluation results demonstrate 41573 new comments can be derived by propagation from other code locations with 88% accuracy. Among them, we can derive precise functional comments for 87 native methods that have neither existing comments nor source code. Leveraging the propagated comments, we detect 37 new bugs in open source large projects, 30 of which have been confirmed and fixed by developers, and 304 defects in existing comments (by looking at inconsistencies between existing and propagated comments), including 12 incomplete comments and 292 wrong comments. This demonstrates the effectiveness of our approach. Our user study confirms propagated comments align well with existing comments in terms of quality.
引用
收藏
页码:1359 / 1371
页数:13
相关论文
共 33 条
  • [21] Analysis of Tourist Comments Based on Python']Python Natural Language Processing: Take Guilin Lijiang Waterfall Hotel as an example
    Sun Wenqiang
    2022 5TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND NATURAL LANGUAGE PROCESSING, MLNLP 2022, 2022, : 234 - 238
  • [22] Natural Language Processing and Sentiment Analysis on Bangla Social Media Comments on Russia-Ukraine War Using Transformers
    Hasan, Mahmud
    Islam, Labiba
    Jahan, Ismat
    Meem, Sabrina Mannan
    Rahman, Rashedur M.
    VIETNAM JOURNAL OF COMPUTER SCIENCE, 2023, 10 (03) : 329 - 356
  • [23] English Language Learning via YouTube: An NLP-Based Analysis of Users' Comments (vol 12, 24, 2023)
    Alawadh, Husam M.
    Alabrah, Amerah
    Meraj, Talha
    Rauf, Hafiz Tayyab
    COMPUTERS, 2025, 14 (02)
  • [24] Treating conduct disorder: An effectiveness and natural language analysis study of a new family-centred intervention program
    Stevens, Kimberly A.
    Ronan, Kevin
    Davies, Gene
    PSYCHIATRY RESEARCH, 2017, 251 : 287 - 293
  • [25] Food for thought: A natural language processing analysis of the 2020 Dietary Guidelines public comments (Vol 114, Pg 713, 2021)
    Lindquist, Joseph
    Boon, Diana M.
    Turner, Dusty
    Blankenship, Jeanne
    Kyle, Theodore K.
    AMERICAN JOURNAL OF CLINICAL NUTRITION, 2023, 117 (05): : 1049 - 1049
  • [26] Talk2Data: A Natural Language Interface for Exploratory Visual Analysis via Question Decomposition
    Guo, Yi
    Shi, Danqing
    Guo, Mingjuan
    Wu, Yanqiu
    Cao, Nan
    Chen, Qing
    ACM TRANSACTIONS ON INTERACTIVE INTELLIGENT SYSTEMS, 2024, 14 (02)
  • [27] Revolution of Medical Review: The Application of Meta-Analysis and Convolutional Neural Network-Natural Language Processing in Classifying the Literature for Head and Neck Cancer Radiotherapy
    Lee, Tsair-Fwu
    Chang, Chu-Ho
    Shao, Jen-Chung
    Liu, Yen-Hsien
    Chiu, Chien-Liang
    Hsieh, Yang-Wei
    Lee, Shen-Hao
    Chao, Pei-Ju
    Yeh, Shyh-An
    CANCER CONTROL, 2024, 31
  • [28] A perspective on the advancement of natural language processing tasks via topological analysis of complex networks Comment on "Approaching human language with complex networks" by Cong and Liu
    Amancio, Diego Raphael
    PHYSICS OF LIFE REVIEWS, 2014, 11 (04) : 641 - 643
  • [29] Automated Identification of Substantial Changes in Construction Projects of Airport Improvement Program: Machine Learning and Natural Language Processing Comparative Analysis
    Khalef, Ramy
    El-adaway, Islam H.
    JOURNAL OF MANAGEMENT IN ENGINEERING, 2021, 37 (06)
  • [30] Content Analysis of Textbooks via Natural Language Processing: Findings on Gender, Race, and Ethnicity in Texas US History Textbooks
    Lucy, Li
    Demszky, Dorottya
    Bromley, Patricia
    Jurafsky, Dan
    AERA OPEN, 2020, 6 (03)