IH:mpirical Evaluation of the Impact of Class Overlap on Software Defect Prediction

被引:32
|
作者
Gong, Lina [1 ,2 ,3 ]
Jiang, Shujuan [1 ,2 ]
Wang, Rongcun [1 ,2 ]
Jiang, Li [1 ,2 ]
机构
[1] China Univ Min & Technol, Sch Comp Sci & Technol, Xuzhou 221116, Jiangsu, Peoples R China
[2] Minist Educ, Mine Digitizat Engn Res Ctr, Xuzhou 221116, Jiangsu, Peoples R China
[3] Zaozhuang Univ, Dept Informat Sci & Engn, Zaozhuang 277160, Peoples R China
基金
中国国家自然科学基金;
关键词
Class overlap; Software defect prediction; K Means clustering; Machine learning; MACHINE;
D O I
10.1109/ASE.2019.00071
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Software defect prediction (SDP) utilizes the learning models to detect the defective modules in project, and their performance depends on the quality of training data. The previous researches mainly focus on the quality problems of class imbalance and feature redundancy. However, training data often contains some instances that belong to different class but have similar values on features, and this leads to class overlap to affect the quality of training data. Our goal is to investigate the impact of class overlap on software defect prediction. At the same time, we propose an improved K-Means clustering cleaning approach (IKMCCA) to solve both the class overlap and class imbalance problems. Specifically, we check whether K Means clustering cleaning approach (KMCCA) or neighborhood cleaning learning (NCL) or IKMCCA is feasible to improve defect detection performance for two cases (i) within -project defect prediction (WPDP) (ii) cross -project defect prediction (CPDP). To have an objective estimate of class overlap, we carry out our investigations on 28 open source projects, and compare the performance of state-of-the-art learning models for the above mentioned cases by using IKMCCA or KMCCA or NCL VS. without cleaning data. The experimental results make clear that learning models obtain significantly better performance in terms of balance, Recall and AUC for both WPDP and CPDP when the overlapping instances are removed. Moreover, it is better to consider both class overlap and class imbalance.
引用
收藏
页码:710 / 721
页数:12
相关论文
共 50 条
  • [31] Influence Analysis Method of Class Imbalance on Software Defect Prediction Model Stability and Prediction Performance
    Zhang Y.-M.
    Zhi S.-L.
    Jiang S.-J.
    Yuan G.
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2023, 51 (08): : 2076 - 2087
  • [32] Software defect prediction based on correlation weighted class association rule mining
    Shao, Yuanxun
    Liu, Bin
    Wang, Shihai
    Li, Guoqi
    KNOWLEDGE-BASED SYSTEMS, 2020, 196
  • [33] Class Balancing Approaches to Improve for Software Defect Prediction Estimations: A Comparative Study
    Sanchez-Garcia, angel J.
    Limon, Xavier
    Dominguez-Isidro, Saul
    Olvera-Villeda, Dan Javier
    Perez-Arriaga, Juan Carlos
    PROGRAMMING AND COMPUTER SOFTWARE, 2024, 50 (08) : 621 - 647
  • [34] A Hybrid Approach to Coping with High Dimensionality and Class Imbalance for Software Defect Prediction
    Gao, Kehan
    Khoshgoftaar, Taghi
    Napolitano, Amri
    2012 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2012), VOL 2, 2012, : 281 - 288
  • [35] Cross-Project Software Defect Prediction Based on Class Code Similarity
    Wen, Wanzhi
    Shen, Chenqiang
    Lu, Xiaohong
    Li, Zhixian
    Wang, Haoren
    Zhang, Ruinian
    Zhu, Ningbo
    IEEE ACCESS, 2022, 10 : 105485 - 105495
  • [36] Class Imbalance Learning to Heterogeneous Cross-Software Projects Defect Prediction
    Vashisht, Rohit
    Rizvi, Syed Afzal Murtaza
    INTERNATIONAL JOURNAL OF SOFTWARE INNOVATION, 2022, 10 (01)
  • [37] Class Balancing Approaches in Dataset for Software Defect Prediction: A Systematic Literature Review
    Olvera-Villeda, Dan Javier
    Sanchez-Garcia, Angel J.
    Limon, Xavier
    Dominguez Isidro, Saul
    2023 11TH INTERNATIONAL CONFERENCE IN SOFTWARE ENGINEERING RESEARCH AND INNOVATION, CONISOFT 2023, 2023, : 236 - 245
  • [38] Which type of metrics are useful to deal with class imbalance in software defect prediction?
    Ozturk, Muhammed Maruf
    INFORMATION AND SOFTWARE TECHNOLOGY, 2017, 92 : 17 - 29
  • [39] Adaptive Centre-Weighted Oversampling for Class Imbalance in Software Defect Prediction
    Zhao, Qi
    Yan, Xuefeng
    Zhou, Yong
    2018 IEEE INT CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, UBIQUITOUS COMPUTING & COMMUNICATIONS, BIG DATA & CLOUD COMPUTING, SOCIAL COMPUTING & NETWORKING, SUSTAINABLE COMPUTING & COMMUNICATIONS, 2018, : 223 - 230
  • [40] Research on software defect prediction
    Laboratory for Internet Software Technologies, Institute of Software, Chinese Acad. of Sci., Beijing 100190, China
    不详
    不详
    Ruan Jian Xue Bao, 2008, 7 (1565-1580): : 1565 - 1580