An Extensive Performance Comparison between Feature Reduction and Feature Selection Preprocessing Algorithms on Imbalanced Wide Data

被引:0
|
作者
Ramos-Perez, Ismael [1 ]
Barbero-Aparicio, Jose Antonio [1 ]
Canepa-Oneto, Antonio [1 ]
Arnaiz-Gonzalez, Alvar [1 ]
Maudes-Raedo, Jesus [1 ]
机构
[1] Univ Burgos, Escuela Politecn Super, Dept Comp Engn, Avda Cantabria S-N, Burgos 09006, Spain
关键词
feature selection; feature reduction; wide data; high dimensional data; imbalanced data; machine learning; DIMENSIONALITY REDUCTION; STATISTICAL COMPARISONS; GENETIC ALGORITHM; CLASSIFICATION; CLASSIFIERS; PROJECTIONS; SYSTEMS; FIT;
D O I
10.3390/info15040223
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The most common preprocessing techniques used to deal with datasets having high dimensionality and a low number of instances-or wide data-are feature reduction (FR), feature selection (FS), and resampling. This study explores the use of FR and resampling techniques, expanding the limited comparisons between FR and filter FS methods in the existing literature, especially in the context of wide data. We compare the optimal outcomes from a previous comprehensive study of FS against new experiments conducted using FR methods. Two specific challenges associated with the use of FR are outlined in detail: finding FR methods that are compatible with wide data and the need for a reduction estimator of nonlinear approaches to process out-of-sample data. The experimental study compares 17 techniques, including supervised, unsupervised, linear, and nonlinear approaches, using 7 resampling strategies and 5 classifiers. The results demonstrate which configurations are optimal, according to their performance and computation time. Moreover, the best configuration-namely, k Nearest Neighbor (KNN) + the Maximal Margin Criterion (MMC) feature reducer with no resampling-is shown to outperform state-of-the-art algorithms.
引用
收藏
页数:20
相关论文
共 50 条
  • [1] Feature Selection in Imbalanced Data
    Kamalov F.
    Thabtah F.
    Leung H.H.
    [J]. Annals of Data Science, 2023, 10 (6) : 1527 - 1541
  • [2] When is resampling beneficial for feature selection with imbalanced wide data?
    Ramos-Perez, Ismael
    Arnaiz-Gonzalez, Alvar
    Rodriguez, Juan J.
    Garcia-Osorio, Cesar
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2022, 188
  • [3] A Comparison between Two Feature Selection Algorithms
    Bancioiu, Camil
    Vintan, Lucian
    [J]. 2017 21ST INTERNATIONAL CONFERENCE ON SYSTEM THEORY, CONTROL AND COMPUTING (ICSTCC), 2017, : 242 - 247
  • [4] Univariate feature selection on imbalanced data
    Chatterjee, Avishek
    Woodruff, Henry
    Lobbes, Marc
    Vallieres, Martin
    Seuntjens, Jan
    [J]. MEDICAL PHYSICS, 2019, 46 (11) : 5375 - 5375
  • [5] Evolutionary feature selection for imbalanced data
    Tusell Rey, Claudia C.
    Salinas Garcia, Viridiana
    Villuendas-Rey, Yenny
    [J]. 2023 MEXICAN INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE, ENC, 2024,
  • [6] Causal Feature Selection With Imbalanced Data
    Ling, Zhaolong
    Wu, Jingxuan
    Zhang, Yiwen
    Zhou, Peng
    Yu, Kui
    Jiang, Bingbing
    Wu, Xindong
    [J]. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024,
  • [7] An Evaluation of Feature Selection and Reduction Algorithms for Network IDS Data
    Bjerkestrand, Therese
    Tsaptsinos, Dimitris
    Pfluegel, Eckhard
    [J]. 2015 International Conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA), 2015,
  • [8] Comparison of metrics for feature selection in imbalanced text classification
    Ogura, Hiroshi
    Amano, Hiromi
    Kondo, Masato
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (05) : 4978 - 4989
  • [9] An Embedded Feature Selection Method for Imbalanced Data Classification
    Liu, Haoyue
    Zhou, MengChu
    Liu, Qing
    [J]. IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2019, 6 (03) : 703 - 715
  • [10] Feature Selection with Imbalanced Data for Software Defect Prediction
    Khoshgoftaar, Taghi M.
    Gao, Kehan
    [J]. EIGHTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2009, : 235 - +