Topic-based software defect explanation

被引:15
|
作者
Chen, Tse-Hsun [1 ]
Shang, Weiyi [2 ]
Nagappan, Meiyappan [3 ]
Hassan, Ahmed E. [1 ]
Thomas, Stephen W. [1 ]
机构
[1] Queens Univ, Sch Comp, SAIL, Kingston, ON, Canada
[2] Concordia Univ, Montreal, PQ, Canada
[3] Rochester Inst Technol, Rochester, NY 14623 USA
关键词
Code quality; Topic modeling; LDA; Metrics; Cohesion; Coupling; CONCEPTUAL COHESION; METRICS; PREDICTION;
D O I
10.1016/j.jss.2016.05.015
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Researchers continue to propose metrics using measurable aspects of software systems to understand software quality. However, these metrics largely ignore the functionality, i.e., the conceptual concerns, of software systems. Such concerns are the technical concepts that reflect the system's business logic. For instance, while lines of code may be a good general measure for defects, a large file responsible for simple I/O tasks is likely to have fewer defects than a small file responsible for complicated compiler implementation details. In this paper, we study the effect of concerns on software quality. We use a statistical topic modeling approach to approximate software concerns as topics (related words in source code). We propose various metrics using these topics to help explain the file defect-proneness. Case studies on multiple versions of Firefox, Eclipse, Mylyn, and NetBeans show that (i) some topics are more defect-prone than others; (ii) defect-prone topics tend to remain so over time; (iii) our topic-based metrics provide additional explanatory power for software quality over existing structural and historical metrics; and (iv) our topic-based cohesion metric outperforms state-of-the-art topic-based cohesion and coupling metrics in terms of defect explanatory power, while being simpler to implement and more intuitive to interpret. (C) 2016 Elsevier Inc. All rights reserved.
引用
收藏
页码:79 / 106
页数:28
相关论文
共 50 条
  • [1] Topic-based Defect Prediction (NIER Track)
    Tung Thanh Nguyen
    Nguyen, Tien N.
    Tu Minh Phuong
    [J]. 2011 33RD INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE), 2011, : 932 - 935
  • [2] Topic-Based Hierarchical Segmentation
    Chien, Jen-Tzung
    Chueh, Chuang-Hua
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (01): : 55 - 66
  • [3] Topic-based heterogeneous rank
    Amjad, Tehmina
    Ding, Ying
    Daud, Ali
    Xu, Jian
    Malic, Vincent
    [J]. SCIENTOMETRICS, 2015, 104 (01) : 313 - 334
  • [4] Topic-Based Sentiment Analysis
    Buddhitha, Prasadith
    Inkpen, Diana
    [J]. INFORMATION MANAGEMENT AND BIG DATA, 2017, 656 : 95 - 107
  • [5] Topic-based heterogeneous rank
    Tehmina Amjad
    Ying Ding
    Ali Daud
    Jian Xu
    Vincent Malic
    [J]. Scientometrics, 2015, 104 : 313 - 334
  • [6] Signaling Context in Topic-Based Writing
    Swarts, Jason
    [J]. TECHNICAL COMMUNICATION, 2022, 69 (01) : 40 - 53
  • [7] Topic-based Indexing of Federated Datasets
    Sorrentino, Ciro
    Giallonardo, Ester
    Zimeo, Eugenio
    [J]. SAC '19: PROCEEDINGS OF THE 34TH ACM/SIGAPP SYMPOSIUM ON APPLIED COMPUTING, 2019, : 1090 - 1098
  • [8] Topic-Based Communication Between Agents
    Galimullin, Rustam
    Velazquez-Quesada, Fernando R.
    [J]. STUDIA LOGICA, 2024,
  • [9] Detecting topic-based communities in social networks: A study in a real software development network
    Horta, Vitor A. C.
    Stroele, Victor
    Oliveira, Jonice
    Braga, Regina
    David, Jose Maria N.
    Campos, Fernanda
    [J]. JOURNAL OF WEB SEMANTICS, 2022, 74
  • [10] Personalized topic-based tag recommendation
    Krestel, Ralf
    Fankhauser, Peter
    [J]. NEUROCOMPUTING, 2012, 76 (01) : 61 - 70