When is resampling beneficial for feature selection with imbalanced wide data?

被引:16
|
作者
Ramos-Perez, Ismael [1 ]
Arnaiz-Gonzalez, Alvar [1 ]
Rodriguez, Juan J. [1 ]
Garcia-Osorio, Cesar [1 ]
机构
[1] Univ Burgos, Dept Comp Engn, Escuela Politecn Super, Avda Cantabria S-N, Burgos 09006, Province Of Bur, Spain
关键词
Feature selection; Wide data; High dimensional data; Very low sample size; Unbalanced; Machine learning; CLASSIFICATION; PERFORMANCE; DIAGNOSIS;
D O I
10.1016/j.eswa.2021.116015
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper studies the effects that combinations of balancing and feature selection techniques have on wide data (many more attributes than instances) when different classifiers are used. For this, an extensive study is done using 14 datasets, 3 balancing strategies, and 7 feature selection algorithms. The evaluation is carried out using 5 classification algorithms, analyzing the results for different percentages of selected features, and establishing the statistical significance using Bayesian tests. Some general conclusions of the study are that it is better to use RUS before the feature selection, while ROS and SMOTE offer better results when applied afterwards. Additionally, specific results are also obtained depending on the classifier used, for example, for Gaussian SVM the best performance is obtained when the feature selection is done with SVM-RFE before balancing the data with RUS.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Denying Evolution Resampling: An Improved Method for Feature Selection on Imbalanced Data
    Quan, Li
    Gong, Tao
    Jiang, Kaida
    [J]. ELECTRONICS, 2023, 12 (15)
  • [2] Feature Selection in Imbalanced Data
    Kamalov F.
    Thabtah F.
    Leung H.H.
    [J]. Annals of Data Science, 2023, 10 (06) : 1527 - 1541
  • [3] Univariate feature selection on imbalanced data
    Chatterjee, Avishek
    Woodruff, Henry
    Lobbes, Marc
    Vallieres, Martin
    Seuntjens, Jan
    [J]. MEDICAL PHYSICS, 2019, 46 (11) : 5375 - 5375
  • [4] Evolutionary feature selection for imbalanced data
    Tusell Rey, Claudia C.
    Salinas Garcia, Viridiana
    Villuendas-Rey, Yenny
    [J]. 2023 MEXICAN INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE, ENC, 2024,
  • [5] Causal Feature Selection With Imbalanced Data
    Ling, Zhaolong
    Wu, Jingxuan
    Zhang, Yiwen
    Zhou, Peng
    Yu, Kui
    Jiang, Bingbing
    Wu, Xindong
    [J]. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024,
  • [6] Optimal selection of resampling methods for imbalanced data with high complexity
    Kim, Annie
    Jung, Inkyung
    [J]. PLOS ONE, 2023, 18 (07):
  • [7] An Extensive Performance Comparison between Feature Reduction and Feature Selection Preprocessing Algorithms on Imbalanced Wide Data
    Ramos-Perez, Ismael
    Barbero-Aparicio, Jose Antonio
    Canepa-Oneto, Antonio
    Arnaiz-Gonzalez, Alvar
    Maudes-Raedo, Jesus
    [J]. INFORMATION, 2024, 15 (04)
  • [8] An Embedded Feature Selection Method for Imbalanced Data Classification
    Liu, Haoyue
    Zhou, MengChu
    Liu, Qing
    [J]. IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2019, 6 (03) : 703 - 715
  • [9] Feature Selection with Imbalanced Data for Software Defect Prediction
    Khoshgoftaar, Taghi M.
    Gao, Kehan
    [J]. EIGHTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2009, : 235 - +
  • [10] Feature selection for high-dimensional imbalanced data
    Yin, Liuzhi
    Ge, Yong
    Xiao, Keli
    Wang, Xuehua
    Quan, Xiaojun
    [J]. NEUROCOMPUTING, 2013, 105 : 3 - 11