When is resampling beneficial for feature selection with imbalanced wide data?

被引:16
|
作者
Ramos-Perez, Ismael [1 ]
Arnaiz-Gonzalez, Alvar [1 ]
Rodriguez, Juan J. [1 ]
Garcia-Osorio, Cesar [1 ]
机构
[1] Univ Burgos, Dept Comp Engn, Escuela Politecn Super, Avda Cantabria S-N, Burgos 09006, Province Of Bur, Spain
关键词
Feature selection; Wide data; High dimensional data; Very low sample size; Unbalanced; Machine learning; CLASSIFICATION; PERFORMANCE; DIAGNOSIS;
D O I
10.1016/j.eswa.2021.116015
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper studies the effects that combinations of balancing and feature selection techniques have on wide data (many more attributes than instances) when different classifiers are used. For this, an extensive study is done using 14 datasets, 3 balancing strategies, and 7 feature selection algorithms. The evaluation is carried out using 5 classification algorithms, analyzing the results for different percentages of selected features, and establishing the statistical significance using Bayesian tests. Some general conclusions of the study are that it is better to use RUS before the feature selection, while ROS and SMOTE offer better results when applied afterwards. Additionally, specific results are also obtained depending on the classifier used, for example, for Gaussian SVM the best performance is obtained when the feature selection is done with SVM-RFE before balancing the data with RUS.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] A hybrid stacking classifier with feature selection for handling imbalanced data
    Abraham, Asha
    Kayalvizhi, R.
    Mohideen, Habeeb Shaik
    [J]. Journal of Intelligent and Fuzzy Systems, 2024, 46 (04): : 9103 - 9117
  • [22] A Novel Feature Selection Method in the Categorization of Imbalanced Textual Data
    Pouramini, Jafar
    Minaei-Bidgoli, Behrouze
    Esmaeili, Mahdi
    [J]. KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2018, 12 (08): : 3725 - 3748
  • [23] Feature selection for imbalanced data with deep sparse autoencoders ensemble
    Massi, Michela Carlotta
    Gasperoni, Francesca
    Ieva, Francesca
    Paganoni, Anna Maria
    [J]. STATISTICAL ANALYSIS AND DATA MINING, 2022, 15 (03) : 376 - 395
  • [24] Smart Robust Feature Selection (SoFt) for imbalanced and heterogeneous data
    Kasim, Henry
    King, Stephen
    Lee, Gary Kee Khoon
    Sirigina, Rajendra Prasad
    How, Shannon Shi Qi
    Hung, Terence Gih Guang
    [J]. KNOWLEDGE-BASED SYSTEMS, 2022, 236
  • [25] Evolutionary multistage multitasking method for feature selection in imbalanced data
    Ding, Weiping
    Yao, Hongcheng
    Huang, Jiashuang
    Hou, Tao
    Geng, Yu
    [J]. Swarm and Evolutionary Computation, 2025, 92
  • [26] An effective distance based feature selection approach for imbalanced data
    Shahee, Shaukat Ali
    Ananthakumar, Usha
    [J]. APPLIED INTELLIGENCE, 2020, 50 (03) : 717 - 745
  • [27] An effective distance based feature selection approach for imbalanced data
    Shaukat Ali Shahee
    Usha Ananthakumar
    [J]. Applied Intelligence, 2020, 50 : 717 - 745
  • [28] Feature selection via minimizing global redundancy for imbalanced data
    Shuhao Huang
    Hongmei Chen
    Tianrui Li
    Hao Chen
    Chuan Luo
    [J]. Applied Intelligence, 2022, 52 : 8685 - 8707
  • [29] Weighted Gini Index Feature Selection Method for Imbalanced Data
    Liu, Haoyue
    Zhou, MengChu
    Lu, Xiaoyu Sean
    Yao, Cynthia
    [J]. 2018 IEEE 15TH INTERNATIONAL CONFERENCE ON NETWORKING, SENSING AND CONTROL (ICNSC), 2018,
  • [30] An Approach Based on Resampling and Feature Selection to Improve the Classification of Microarray Data
    Soleymani, Nafiseh
    Moattar, Mohammad Hussein
    [J]. 2018 6TH IRANIAN JOINT CONGRESS ON FUZZY AND INTELLIGENT SYSTEMS (CFIS), 2018, : 61 - 64