Neighbor cleaning learning based cost-sensitive ensemble learning approach for software defect prediction

被引:1
|
作者
Li, Li [1 ]
Su, Renjia [1 ]
Zhao, Xin [1 ]
机构
[1] Northeast Forestry Univ, Sch Comp & Control Engn, Harbin, Peoples R China
来源
关键词
class imbalance; class overlap; cost-sensitive learning; machine learning; software defect prediction;
D O I
10.1002/cpe.8017
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The class imbalance problem in software defect prediction datasets leads to prediction results that are biased toward the majority class, and the class overlap problem leads to fuzzy boundaries for classification decisions, both of which affect the model's prediction performance on the dataset. A neighbor cleaning learning (NCL) is an effective technique for defect prediction. To solve the class overlap problem and class imbalance problem, the NCL-based cost-sensitive ensemble learning approach for software defect prediction (NCL_CSEL) model is proposed. First, the bootstrap resampled data are trained using the base classifier. Subsequently, multiple classifiers are integrated by a static ensemble to obtain the final classification results. As the base classifier, the Adaptive Boosting (AdaBoost) classifier combining NCL and cost-sensitive learning is proposed, and the class overlap problem and class imbalance problem are solved by balancing the proportion of overlap sample removal in NCL and the size of the cost factor in cost-sensitive learning. Specifically, the NCL algorithm is used to initialize the sample weights, while the cost-sensitive method is employed to update the sample weights. Experiments based on the NASA dataset and AEEEM dataset show that the defect prediction model can improve the bal value by approximately 7% and the AUC value by 9.5% when the NCL algorithm is added. NCL_CSEL can effectively solve the class imbalance problem and significantly improve the prediction performance compared with existing methods for solving the class imbalance problem.
引用
收藏
页数:14
相关论文
共 50 条
  • [31] Cost-sensitive learning based on Bregman divergences
    Santos-Rodriguez, Raul
    Guerrero-Curieses, Alicia
    Alaiz-Rodriguez, Rocio
    Cid-Sueiro, Jesus
    MACHINE LEARNING, 2009, 76 (2-3) : 271 - 285
  • [32] Cost-sensitive learning based on Bregman divergences
    Raúl Santos-Rodríguez
    Alicia Guerrero-Curieses
    Rocío Alaiz-Rodríguez
    Jesús Cid-Sueiro
    Machine Learning, 2009, 76 : 271 - 285
  • [33] Cost-Sensitive Learning Based on Bregman Divergences
    Santos-Rodriguez, Raul
    Guerrero-Curieses, Alicia
    Alaiz-Rodriguez, Rocio
    Cid-Sueiro, Jesus
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT I, 2009, 5781 : 12 - 12
  • [34] Multiple kernel ensemble learning for software defect prediction
    Tiejian Wang
    Zhiwu Zhang
    Xiaoyuan Jing
    Liqiang Zhang
    Automated Software Engineering, 2016, 23 : 569 - 590
  • [35] Multiple kernel ensemble learning for software defect prediction
    Wang, Tiejian
    Zhang, Zhiwu
    Jing, Xiaoyuan
    Zhang, Liqiang
    AUTOMATED SOFTWARE ENGINEERING, 2016, 23 (04) : 569 - 590
  • [36] Software Defect Prediction Based Ensemble Approach
    Harikiran J.
    Chandana B.S.
    Srinivasarao B.
    Raviteja B.
    Reddy T.S.
    Computer Systems Science and Engineering, 2023, 45 (03): : 2313 - 2331
  • [37] Cost-Sensitive Learning to Rank
    McBride, Ryan
    Wang, Ke
    Ren, Zhouyang
    Li, Wenyuan
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 4570 - 4577
  • [38] A weighted rough set approach for cost-sensitive learning
    Liu, Jinfu
    Yu, Daren
    ROUGH SETS, FUZZY SETS, DATA MINING AND GRANULAR COMPUTING, PROCEEDINGS, 2007, 4482 : 355 - +
  • [39] Active Cost-Sensitive Learning
    Margineantu, Dragos D.
    19TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-05), 2005, : 1622 - 1623
  • [40] Learning From Weights: Cost-Sensitive Approach For Retrieval
    Begwani, Nikit
    Harsola, Shrutendra
    Agrawal, Rahul
    PROCEEDINGS OF THE 7TH ACM IKDD CODS AND 25TH COMAD (CODS-COMAD 2020), 2020, : 170 - 174