A neighborhood rough sets-based ensemble method, with application to software fault prediction

被引:0
|
作者
Jiang, Feng [1 ]
Hu, Qiang [1 ]
Yang, Zhiyong [1 ]
Liu, Jinhuan [2 ]
Du, Junwei [2 ]
机构
[1] Qingdao Univ Sci & Technol, Coll Informat Sci & Technol, Qingdao 266061, Peoples R China
[2] Qingdao Univ Sci & Technol, Sch Data Sci, Qingdao 266061, Peoples R China
关键词
Ensemble learning; Software fault prediction; Neighborhood rough sets; Reduct; Neighborhood approximate reduct; Imbalanced data; SYSTEM;
D O I
10.1016/j.eswa.2024.125919
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Software fault prediction (SFP) aims to detect fault-prone software modules, which is beneficial for allocating software testing resources and improving software quality. Recently, ensemble learning(EL)-based SFP methods have attracted much attention. Although many EL algorithms have been applied to SFP, they are still insufficient to generate multiple accurate and diverse base learners. Therefore, this paper presents a multi-modal EL algorithm (called NRSEL) based on neighborhood rough sets. In NRSEL, the technique of neighborhood approximate reduct (NAR) is used to implement the perturbation of attribute space and the bootstrap sampling technique is used to implement the perturbation of sample space. Asa novel technique for the perturbation of attribute space, NAR stems from the concept of approximate reduct in rough sets. We also consider the application of NRSEL to SFP, and employ a hybrid scheme (called SMOTE-NRSEL) to handle the problem of imbalanced data in SFP. We compare SMOTE-NRSEL with existing EL algorithms using 20 public datasets. Experimental results indicate that SMOTE-NRSEL is effective for SFP. Compared with the baseline algorithms, on average, SMOTE-NRSEL improves the AUC, F1-score, and MCC by 3.09%, 3.18%, and 7.5%, respectively. Moreover, the results of three statistical tests (including the paired t-test, Friedman test, and Nemenyi test) indicate that SMOTE-NRSEL is significantly better than the baseline algorithms inmost cases. This paper shows that NAR is a good choice for the perturbation of attribute space. With the help of NAR and the multi-modal perturbation strategy based on it, SMOTE-NRSEL can generate accurate and diverse base learners. The code is available at https://github.com/jiangfeng0278/NRSEL.
引用
收藏
页数:17
相关论文
共 50 条
  • [21] Neighborhood Approximate Reducts-Based Ensemble Learning Algorithm and Its Application in Software Defect Prediction
    Yang, Zhiyong
    Du, Junwei
    Hu, Qiang
    Jiang, Feng
    ROUGH SETS, IJCRS 2022, 2022, 13633 : 100 - 113
  • [22] Rough sets-based tri-trade for partially labeled data
    Ziming Luo
    Can Gao
    Jie Zhou
    Applied Intelligence, 2023, 53 : 17708 - 17726
  • [23] Rough Sets-Based Rules Generation Approach: A Hepatitis C Virus Data Sets
    Zaki, Ahmed
    Salama, Mostafa A.
    Hefny, Hesham
    Hassanien, Aboul Ella
    ADVANCED MACHINE LEARNING TECHNOLOGIES AND APPLICATIONS, 2012, 322 : 52 - +
  • [24] Rough Sets-based Prototype Optimization in Kanerva-based Function Approximation
    Wu, Cheng
    Li, Wei
    Meleis, Waleed
    2015 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY (WI-IAT), VOL 2, 2015, : 283 - 291
  • [25] Fast Fault Prediction Model Based on Rough Sets and Grey Model
    Niu, Wei
    Cheng, Juan
    Wang, Guoqing
    Zhai, Zhengjun
    JOURNAL OF COMPUTATIONAL AND THEORETICAL NANOSCIENCE, 2013, 10 (06) : 1460 - 1464
  • [26] Fuzzy Rough Sets-Based Incremental Feature Selection for Hierarchical Classification
    Huang, Wanli
    She, Yanhong
    He, Xiaoli
    Ding, Weiping
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2023, 31 (10) : 3721 - 3733
  • [27] Safety monitoring data classification method based on wireless rough network of neighborhood rough sets
    Liu, Dan
    Li, Jingwei
    SAFETY SCIENCE, 2019, 118 : 103 - 108
  • [28] Kernel Neighborhood Rough Sets Model and Its Application
    Zeng, Kai
    Jing, Siyuan
    COMPLEXITY, 2018,
  • [29] Improved LLE and neighborhood rough sets-based gene selection using Lebesgue measure for cancer classification on gene expression data
    Sun, Lin
    Wang, Wei
    Xu, Jiucheng
    Zhang, Shiguang
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2019, 37 (04) : 5731 - 5742
  • [30] Building a rough sets-based prediction model for classifying large-scale construction projects based on sustainable success index
    Akbari, Saeed
    Khanzadi, Mostafa
    Gholamian, Mohammad Reza
    ENGINEERING CONSTRUCTION AND ARCHITECTURAL MANAGEMENT, 2018, 25 (04) : 534 - 558