Topic-based software defect explanation

被引:15
|
作者
Chen, Tse-Hsun [1 ]
Shang, Weiyi [2 ]
Nagappan, Meiyappan [3 ]
Hassan, Ahmed E. [1 ]
Thomas, Stephen W. [1 ]
机构
[1] Queens Univ, Sch Comp, SAIL, Kingston, ON, Canada
[2] Concordia Univ, Montreal, PQ, Canada
[3] Rochester Inst Technol, Rochester, NY 14623 USA
关键词
Code quality; Topic modeling; LDA; Metrics; Cohesion; Coupling; CONCEPTUAL COHESION; METRICS; PREDICTION;
D O I
10.1016/j.jss.2016.05.015
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Researchers continue to propose metrics using measurable aspects of software systems to understand software quality. However, these metrics largely ignore the functionality, i.e., the conceptual concerns, of software systems. Such concerns are the technical concepts that reflect the system's business logic. For instance, while lines of code may be a good general measure for defects, a large file responsible for simple I/O tasks is likely to have fewer defects than a small file responsible for complicated compiler implementation details. In this paper, we study the effect of concerns on software quality. We use a statistical topic modeling approach to approximate software concerns as topics (related words in source code). We propose various metrics using these topics to help explain the file defect-proneness. Case studies on multiple versions of Firefox, Eclipse, Mylyn, and NetBeans show that (i) some topics are more defect-prone than others; (ii) defect-prone topics tend to remain so over time; (iii) our topic-based metrics provide additional explanatory power for software quality over existing structural and historical metrics; and (iv) our topic-based cohesion metric outperforms state-of-the-art topic-based cohesion and coupling metrics in terms of defect explanatory power, while being simpler to implement and more intuitive to interpret. (C) 2016 Elsevier Inc. All rights reserved.
引用
收藏
页码:79 / 106
页数:28
相关论文
共 50 条
  • [31] Unsupervised Construction of Topic-based Twitter Lists
    de Villiers, Francois
    Hoffmann, McElory
    Kroon, Steve
    [J]. PROCEEDINGS OF 2012 ASE/IEEE INTERNATIONAL CONFERENCE ON PRIVACY, SECURITY, RISK AND TRUST AND 2012 ASE/IEEE INTERNATIONAL CONFERENCE ON SOCIAL COMPUTING (SOCIALCOM/PASSAT 2012), 2012, : 283 - 292
  • [32] Automatic image annotation based on topic-based smoothing
    Zhou, XD
    Ye, JY
    Chen, L
    Zhang, L
    Shi, BL
    [J]. INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING IDEAL 2005, PROCEEDINGS, 2005, 3578 : 86 - 93
  • [33] Topic-based Classification through Unigram Unmasking
    HaCohen-Kerner, Yaakov
    Rosenfeld, Avi
    Sabag, Asaf
    Tzidkani, Maor
    [J]. KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS (KES-2018), 2018, 126 : 69 - 76
  • [34] TCPM: Topic-based Clinical Pathway Mining
    Xu, Xiao
    Jin, Tao
    Wei, Zhijie
    Lv, Cheng
    Wang, Jianmin
    [J]. 2016 IEEE FIRST INTERNATIONAL CONFERENCE ON CONNECTED HEALTH: APPLICATIONS, SYSTEMS AND ENGINEERING TECHNOLOGIES (CHASE), 2016, : 292 - 301
  • [35] Assessing topic-based users credibility in twitter
    Meddeb, Amna
    Ben Romdhane, Lotfi
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (23) : 63329 - 63351
  • [36] Topic-based influential user detection: a survey
    Rrubaa Panchendrarajan
    Akrati Saxena
    [J]. Applied Intelligence, 2023, 53 : 5998 - 6024
  • [37] Proximity semantics for topic-based abstract argumentation
    Budan, Maximiliano C. D.
    Laura Cobo, Maria
    Martinez, Diego C.
    Simari, Guillermo R.
    [J]. INFORMATION SCIENCES, 2020, 508 (135-153) : 135 - 153
  • [38] Topic-Based Unsupervised and Supervised Dictionary Induction
    Liu, Yuzhi
    Piccardi, Massimo
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (03)
  • [39] Time-aware Topic-based Contextualization
    Nam Khanh Tran
    Nejdl, Wolfgang
    Niederee, Claudia
    [J]. WWW'14 COMPANION: PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2014, : 15 - 19
  • [40] Topic-Based PageRank on Author Cocitation Networks
    Ding, Ying
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2011, 62 (03): : 449 - 466