IH:mpirical Evaluation of the Impact of Class Overlap on Software Defect Prediction

被引:32
|
作者
Gong, Lina [1 ,2 ,3 ]
Jiang, Shujuan [1 ,2 ]
Wang, Rongcun [1 ,2 ]
Jiang, Li [1 ,2 ]
机构
[1] China Univ Min & Technol, Sch Comp Sci & Technol, Xuzhou 221116, Jiangsu, Peoples R China
[2] Minist Educ, Mine Digitizat Engn Res Ctr, Xuzhou 221116, Jiangsu, Peoples R China
[3] Zaozhuang Univ, Dept Informat Sci & Engn, Zaozhuang 277160, Peoples R China
基金
中国国家自然科学基金;
关键词
Class overlap; Software defect prediction; K Means clustering; Machine learning; MACHINE;
D O I
10.1109/ASE.2019.00071
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Software defect prediction (SDP) utilizes the learning models to detect the defective modules in project, and their performance depends on the quality of training data. The previous researches mainly focus on the quality problems of class imbalance and feature redundancy. However, training data often contains some instances that belong to different class but have similar values on features, and this leads to class overlap to affect the quality of training data. Our goal is to investigate the impact of class overlap on software defect prediction. At the same time, we propose an improved K-Means clustering cleaning approach (IKMCCA) to solve both the class overlap and class imbalance problems. Specifically, we check whether K Means clustering cleaning approach (KMCCA) or neighborhood cleaning learning (NCL) or IKMCCA is feasible to improve defect detection performance for two cases (i) within -project defect prediction (WPDP) (ii) cross -project defect prediction (CPDP). To have an objective estimate of class overlap, we carry out our investigations on 28 open source projects, and compare the performance of state-of-the-art learning models for the above mentioned cases by using IKMCCA or KMCCA or NCL VS. without cleaning data. The experimental results make clear that learning models obtain significantly better performance in terms of balance, Recall and AUC for both WPDP and CPDP when the overlapping instances are removed. Moreover, it is better to consider both class overlap and class imbalance.
引用
收藏
页码:710 / 721
页数:12
相关论文
共 50 条
  • [21] Combat with Class Overlapping in Software Defect Prediction Using Neighbourhood Metric
    Gupta S.
    Richa
    Kumar R.
    Jain K.L.
    SN Computer Science, 4 (5)
  • [22] An Ensemble Oversampling Model for Class Imbalance Problem in Software Defect Prediction
    Huda, Shamsul
    Liu, Kevin
    Abdelrazek, Mohamed
    Ibrahim, Amani
    Alyahya, Sultan
    Al-Dossari, Hmood
    Ahmad, Shafiq
    IEEE ACCESS, 2018, 6 : 24184 - 24195
  • [23] A Survey of Different Approaches for the Class Imbalance Problem in Software Defect Prediction
    Dar, Abdul Waheed
    Farooq, Sheikh Umar
    INTERNATIONAL JOURNAL OF SOFTWARE SCIENCE AND COMPUTATIONAL INTELLIGENCE-IJSSCI, 2022, 14 (01):
  • [24] Adapting God Class thresholds for software defect prediction: A case study
    Gradisnik, Mitja
    Beranic, Tina
    Karakatic, Saso
    Mausa, Goran
    2019 42ND INTERNATIONAL CONVENTION ON INFORMATION AND COMMUNICATION TECHNOLOGY, ELECTRONICS AND MICROELECTRONICS (MIPRO), 2019, : 1537 - 1542
  • [25] Assessing the Significant Impact of Concept Drift in Software Defect Prediction
    Kabir, Md Alamgir
    Keung, Jacky W.
    Bennin, Kwabena E.
    Zhang, Miao
    2019 IEEE 43RD ANNUAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE (COMPSAC), VOL 1, 2019, : 53 - 58
  • [26] The limited impact of individual developer data on software defect prediction
    Robert M. Bell
    Thomas J. Ostrand
    Elaine J. Weyuker
    Empirical Software Engineering, 2013, 18 : 478 - 505
  • [27] The limited impact of individual developer data on software defect prediction
    Bell, Robert M.
    Ostrand, Thomas J.
    Weyuker, Elaine J.
    EMPIRICAL SOFTWARE ENGINEERING, 2013, 18 (03) : 478 - 505
  • [28] Revisiting the Impact of Dependency Network Metrics on Software Defect Prediction
    Gong, Lina
    Rajbahadur, Gopi Krishnan
    Hassan, Ahmed E.
    Jiang, Shujuan
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2022, 48 (12) : 5030 - 5049
  • [29] Impact of Using Information Gain in Software Defect Prediction Models
    Rana, Zeeshan Ali
    Awais, Mian M.
    Shamail, Shafay
    INTELLIGENT COMPUTING THEORY, 2014, 8588 : 637 - 648
  • [30] Class Imbalance Reduction (CIR): A Novel Approach to Software Defect Prediction in the Presence of Class Imbalance
    Bejjanki, Kiran Kumar
    Gyani, Jayadev
    Gugulothu, Narsimha
    SYMMETRY-BASEL, 2020, 12 (03):