An empirical study on the joint impact of feature selection and data resampling on imbalance classification

被引:27
|
作者
Zhang, Chongsheng [1 ]
Soda, Paolo [2 ,3 ]
Bi, Jingjun [1 ]
Fan, Gaojuan [1 ]
Almpanidis, George [1 ]
Garcia, Salvador [4 ]
Ding, Weiping [5 ]
机构
[1] Henan Univ, Henan Key Lab Big Data Anal & Proc, Kaifeng, Henan, Peoples R China
[2] Univ Campus Biomed Rome, Dept Engn, Rome, Italy
[3] Umea Univ, Dept Radiat Sci, Biomed Engn, Radiat Phys, Umea, Sweden
[4] Univ Granada, DaSCI Andalusian Res Inst, Granada, Spain
[5] Nantong Univ, Sch Informat Sci & Technol, Nantong, Peoples R China
关键词
Imbalanced classification; Feature selection; Data selection; Resampling; SMOTE;
D O I
10.1007/s10489-022-03772-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many real-world datasets exhibit imbalanced distributions, in which the majority classes have sufficient samples, whereas the minority classes often have a very small number of samples. Data resampling has proven to be effective in alleviating such imbalanced settings, while feature selection is a commonly used technique for improving classification performance. However, the joint impact of feature selection and data resampling on two-class imbalance classification has rarely been addressed before. This work investigates the performance of two opposite imbalanced classification frameworks in which feature selection is applied before or after data resampling. We conduct a large-scale empirical study with a total of 9225 experiments on 52 publicly available datasets. The results show that both frameworks should be considered for finding the best performing imbalanced classification model. We also study the impact of classifiers, the ratio between the number of majority and minority samples (IR), and the ratio between the number of samples and features (SFR) on the performance of imbalance classification. Overall, this work provides a new reference value for researchers and practitioners in imbalance learning.
引用
收藏
页码:5449 / 5461
页数:13
相关论文
共 50 条
  • [1] An empirical study on the joint impact of feature selection and data resampling on imbalance classification
    Chongsheng Zhang
    Paolo Soda
    Jingjun Bi
    Gaojuan Fan
    George Almpanidis
    Salvador García
    Weiping Ding
    Applied Intelligence, 2023, 53 : 5449 - 5461
  • [2] Correction to: An empirical study on the joint impact of feature selection and data resampling on imbalance classification
    Chongsheng Zhang
    Paolo Soda
    Jingjun Bi
    Gaojuan Fan
    George Almpanidis
    Salvador García
    Weiping Ding
    Applied Intelligence, 2023, 53 : 8506 - 8506
  • [3] An empirical study on the joint impact of feature selection and data resampling on imbalance classification (Jun, 10.1007/s10489-022-03772-1, 2022)
    Zhang, Chongsheng
    Soda, Paolo
    Bi, Jingjun
    Fan, Gaojuan
    Almpanidis, George
    Garcia, Salvador
    Ding, Weiping
    APPLIED INTELLIGENCE, 2023, 53 (07) : 8506 - 8506
  • [4] Feature Selection and Resampling in Class Imbalance Learning: Which Comes First? An Empirical Study in the Biological Domain
    Zhang, Chongsheng
    Bi, Jingjun
    Soda, Paolo
    2017 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2017, : 933 - 938
  • [5] An Approach Based on Resampling and Feature Selection to Improve the Classification of Microarray Data
    Soleymani, Nafiseh
    Moattar, Mohammad Hussein
    2018 6TH IRANIAN JOINT CONGRESS ON FUZZY AND INTELLIGENT SYSTEMS (CFIS), 2018, : 61 - 64
  • [6] Comprehensive empirical investigation for prioritizing the pipeline of using feature selection and data resampling techniques
    Tyagi, Pooja
    Singh, Jaspreeti
    Gosain, Anjana
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2024, 46 (03) : 6019 - 6040
  • [7] Feature selection in imbalance data sets
    Jamali, Ilnaz
    Bazmara, Mohammad
    Jafari, Shahram
    International Journal of Computer Science Issues, 2012, 9 (3 3-2): : 42 - 45
  • [8] Similarity of feature selection methods: An empirical study across data intensive classification tasks
    Dessi, Nicoletta
    Pes, Barbara
    EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (10) : 4632 - 4642
  • [9] Impact of feature selection methods on data classification for IDS
    Jiang, Shuai
    Xu, Xiaolong
    2019 INTERNATIONAL CONFERENCE ON CYBER-ENABLED DISTRIBUTED COMPUTING AND KNOWLEDGE DISCOVERY (CYBERC), 2019, : 174 - 180
  • [10] Combination of Feature Selection and Resampling Methods to Predict Preterm Birth Based on Electrohysterographic Signals from Imbalance Data
    Nieto-del-Amor, Felix
    Prats-Boluda, Gema
    Garcia-Casado, Javier
    Diaz-Martinez, Alba
    Jose Diago-Almela, Vicente
    Monfort-Ortiz, Rogelio
    Hao, Dongmei
    Ye-Lin, Yiyao
    SENSORS, 2022, 22 (14)