Ensemble learning-based filter-centric hybrid feature selection framework for high-dimensional imbalanced data

被引:23
|
作者
Kim, Jongmo [1 ]
Kang, Jaewoong [1 ]
Sohn, Mye [1 ]
机构
[1] Sungkyunkwan Univ, Dept Ind Engn, Suwon, South Korea
基金
新加坡国家研究基金会;
关键词
Hybrid feature selection; Ensemble feature selection; Multiple classifiers; Robust feature subset; High-dimensional imbalanced data; DIVERSITY MEASURES; DATA-SETS; ROBUST;
D O I
10.1016/j.knosys.2021.106901
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, research on feature selection for high-dimensional imbalanced data has attracted a considerable amount of attention. The filter-wrapper hybrid method, which is a conventional method of feature selection for high-dimensional data, aims to reduce excessive computational time. On the other hand, ensemble learning-based feature selection, even though it has a high level of computational complexity, focuses exclusively on the discovery of robust features. From this perspective, combining these two feature selection methods is not easy. However, a combined method is essential to advancing machine learning research that addresses real-world problems. We propose an filter-centric hybrid method based on ensemble-learning that can select the best feature subset for high-dimensional imbalanced data. The basic concept of the proposed method is to design a feature evaluation scheme based on the filter method and to apply ensemble learning with reasonable computational time. To achieve this objective, our innovative method utilizes predictions produced by multiple classifiers as inputs of the feature evaluation function. As a result, it can reflect the predictive performance of the classifiers and overcome the low performance of selected features by filter methods. In addition, it can find robust features simultaneously. To demonstrate the superiority of the proposed method, we perform various experiments using 14 experimental datasets that consist of low-dimensional balanced, high-dimensional balanced, and high-dimensional imbalanced datasets. Finally, we compare the proposed method with state-of-the-art feature selection methods. (c) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页数:25
相关论文
共 50 条
  • [31] Feature selection for high-dimensional data
    Destrero A.
    Mosci S.
    De Mol C.
    Verri A.
    Odone F.
    [J]. Computational Management Science, 2009, 6 (1) : 25 - 40
  • [32] A Novel Feature Selection-Based Sequential Ensemble Learning Method for Class Noise Detection in High-Dimensional Data
    Chen, Kai
    Guan, Donghai
    Yuan, Weiwei
    Li, Bohan
    Khattak, Asad Masood
    Alfandi, Omar
    [J]. ADVANCED DATA MINING AND APPLICATIONS, ADMA 2018, 2018, 11323 : 55 - 65
  • [33] Classifier Ensemble Based on Multiview Optimization for High-Dimensional Imbalanced Data Classification
    Xu, Yuhong
    Yu, Zhiwen
    Chen, C. L. Philip
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (01) : 870 - 883
  • [34] Ensemble of Trees for Classifying High-Dimensional Imbalanced Genomic Data
    Farid, Dewan Md.
    Nowe, Ann
    Manderick, Bernard
    [J]. PROCEEDINGS OF SAI INTELLIGENT SYSTEMS CONFERENCE (INTELLISYS) 2016, VOL 1, 2018, 15 : 172 - 187
  • [35] Hybrid Filter and Genetic Algorithm-Based Feature Selection for Improving Cancer Classification in High-Dimensional Microarray Data
    Ali, Waleed
    Saeed, Faisal
    [J]. PROCESSES, 2023, 11 (02)
  • [36] The Hybrid Filter Feature Selection Methods for Improving High-Dimensional Text Categorization
    Le Nguyen Hoai Nam
    Ho Bao Quoc
    [J]. INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 2017, 25 (02) : 235 - 265
  • [37] Feature selection for high-dimensional data: A Kolmogorov-Smirnov correlation-based filter
    Biesiada, J
    Duch, W
    [J]. COMPUTER RECOGNITION SYSTEMS, PROCEEDINGS, 2005, : 95 - 103
  • [38] Feature selection based on geometric distance for high-dimensional data
    Lee, J. -H.
    Oh, S. -Y.
    [J]. ELECTRONICS LETTERS, 2016, 52 (06) : 473 - 474
  • [39] Scalable Feature Selection in High-Dimensional Data Based on GRASP
    Moshki, Mohsen
    Kabiri, Peyman
    Mohebalhojeh, Alireza
    [J]. APPLIED ARTIFICIAL INTELLIGENCE, 2015, 29 (03) : 283 - 296
  • [40] Online Streaming Feature Selection for High-Dimensional and Class-Imbalanced Data Based on Neighborhood Rough Set
    Chen, Xiangyan
    Lin, Yaojin
    Wang, Chenxi
    [J]. Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2019, 32 (08): : 726 - 735