An empirical study on the joint impact of feature selection and data resampling on imbalance classification

被引:27
|
作者
Zhang, Chongsheng [1 ]
Soda, Paolo [2 ,3 ]
Bi, Jingjun [1 ]
Fan, Gaojuan [1 ]
Almpanidis, George [1 ]
Garcia, Salvador [4 ]
Ding, Weiping [5 ]
机构
[1] Henan Univ, Henan Key Lab Big Data Anal & Proc, Kaifeng, Henan, Peoples R China
[2] Univ Campus Biomed Rome, Dept Engn, Rome, Italy
[3] Umea Univ, Dept Radiat Sci, Biomed Engn, Radiat Phys, Umea, Sweden
[4] Univ Granada, DaSCI Andalusian Res Inst, Granada, Spain
[5] Nantong Univ, Sch Informat Sci & Technol, Nantong, Peoples R China
关键词
Imbalanced classification; Feature selection; Data selection; Resampling; SMOTE;
D O I
10.1007/s10489-022-03772-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many real-world datasets exhibit imbalanced distributions, in which the majority classes have sufficient samples, whereas the minority classes often have a very small number of samples. Data resampling has proven to be effective in alleviating such imbalanced settings, while feature selection is a commonly used technique for improving classification performance. However, the joint impact of feature selection and data resampling on two-class imbalance classification has rarely been addressed before. This work investigates the performance of two opposite imbalanced classification frameworks in which feature selection is applied before or after data resampling. We conduct a large-scale empirical study with a total of 9225 experiments on 52 publicly available datasets. The results show that both frameworks should be considered for finding the best performing imbalanced classification model. We also study the impact of classifiers, the ratio between the number of majority and minority samples (IR), and the ratio between the number of samples and features (SFR) on the performance of imbalance classification. Overall, this work provides a new reference value for researchers and practitioners in imbalance learning.
引用
收藏
页码:5449 / 5461
页数:13
相关论文
共 50 条
  • [31] Denying Evolution Resampling: An Improved Method for Feature Selection on Imbalanced Data
    Quan, Li
    Gong, Tao
    Jiang, Kaida
    ELECTRONICS, 2023, 12 (15)
  • [32] Theoretical and empirical study on the potential inadequacy of mutual information for feature selection in classification
    Frenay, Benoit
    Doquire, Gauthier
    Verleysen, Michel
    NEUROCOMPUTING, 2013, 112 : 64 - 78
  • [33] An Empirical Study on the Stability of Feature Selection for Imbalanced Software Engineering Data
    Wang, Huanjing
    Khoshgoftaar, Taghi M.
    Napolitano, Amri
    2012 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2012), VOL 1, 2012, : 317 - 323
  • [34] An Empirical Evaluation of Feature Selection Stability and Classification Accuracy
    Buyukkececi, Mustafa
    Okur, Mehmet Cudi
    GAZI UNIVERSITY JOURNAL OF SCIENCE, 2024, 37 (02): : 606 - 620
  • [35] Joint feature and instance selection using manifold data criteria: application to image classification
    Fadi Dornaika
    Artificial Intelligence Review, 2021, 54 : 1735 - 1765
  • [36] Joint feature and instance selection using manifold data criteria: application to image classification
    Dornaika, Fadi
    ARTIFICIAL INTELLIGENCE REVIEW, 2021, 54 (03) : 1735 - 1765
  • [37] Joint imbalanced classification and feature selection for hospital readmissions
    Du, Guodong
    Zhang, Jia
    Luo, Zhiming
    Ma, Fenglong
    Ma, Lei
    Li, Shaozi
    KNOWLEDGE-BASED SYSTEMS, 2020, 200
  • [38] Assessing feature selection method performance with class imbalance data
    Matharaarachchi, Surani
    Domaratzki, Mike
    Muthukumarana, Saman
    MACHINE LEARNING WITH APPLICATIONS, 2021, 6
  • [39] INVESTIGATION OF THE IMPACT OF DIMENSIONALITY REDUCTION AND FEATURE SELECTION ON THE CLASSIFICATION OF HYPERSPECTRAL ENMAP DATA
    Keller, S.
    Braun, A. C.
    Hinz, S.
    Weinmann, M.
    2016 8TH WORKSHOP ON HYPERSPECTRAL IMAGE AND SIGNAL PROCESSING: EVOLUTION IN REMOTE SENSING (WHISPERS), 2016,
  • [40] A Comprehensive Study of Eleven Feature Selection Algorithms and their Impact on Text Classification
    Vora, Suchi
    Yang, Hui
    2017 COMPUTING CONFERENCE, 2017, : 440 - 449