Ensemble learning-based filter-centric hybrid feature selection framework for high-dimensional imbalanced data

被引:23
|
作者
Kim, Jongmo [1 ]
Kang, Jaewoong [1 ]
Sohn, Mye [1 ]
机构
[1] Sungkyunkwan Univ, Dept Ind Engn, Suwon, South Korea
基金
新加坡国家研究基金会;
关键词
Hybrid feature selection; Ensemble feature selection; Multiple classifiers; Robust feature subset; High-dimensional imbalanced data; DIVERSITY MEASURES; DATA-SETS; ROBUST;
D O I
10.1016/j.knosys.2021.106901
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, research on feature selection for high-dimensional imbalanced data has attracted a considerable amount of attention. The filter-wrapper hybrid method, which is a conventional method of feature selection for high-dimensional data, aims to reduce excessive computational time. On the other hand, ensemble learning-based feature selection, even though it has a high level of computational complexity, focuses exclusively on the discovery of robust features. From this perspective, combining these two feature selection methods is not easy. However, a combined method is essential to advancing machine learning research that addresses real-world problems. We propose an filter-centric hybrid method based on ensemble-learning that can select the best feature subset for high-dimensional imbalanced data. The basic concept of the proposed method is to design a feature evaluation scheme based on the filter method and to apply ensemble learning with reasonable computational time. To achieve this objective, our innovative method utilizes predictions produced by multiple classifiers as inputs of the feature evaluation function. As a result, it can reflect the predictive performance of the classifiers and overcome the low performance of selected features by filter methods. In addition, it can find robust features simultaneously. To demonstrate the superiority of the proposed method, we perform various experiments using 14 experimental datasets that consist of low-dimensional balanced, high-dimensional balanced, and high-dimensional imbalanced datasets. Finally, we compare the proposed method with state-of-the-art feature selection methods. (c) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页数:25
相关论文
共 50 条
  • [21] An Improved Ensemble Learning Method for Classifying High-Dimensional and Imbalanced Biomedicine Data
    Yu, Hualong
    Ni, Jun
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2014, 11 (04) : 657 - 666
  • [22] A new hybrid ensemble feature selection framework for machine learning-based phishing detection system
    Chiew, Kang Leng
    Tan, Choon Lin
    Wong, KokSheik
    Yong, Kelvin S. C.
    Tiong, Wei King
    [J]. INFORMATION SCIENCES, 2019, 484 : 153 - 166
  • [23] Hybrid fast unsupervised feature selection for high-dimensional data
    Manbari, Zhaleh
    AkhlaghianTab, Fardin
    Salavati, Chiman
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2019, 124 : 97 - 118
  • [24] Local-Learning-Based Feature Selection for High-Dimensional Data Analysis
    Sun, Yijun
    Todorovic, Sinisa
    Goodison, Steve
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2010, 32 (09) : 1610 - 1626
  • [25] A new improved filter-based feature selection model for high-dimensional data
    Munirathinam, Deepak Raj
    Ranganadhan, Mohanasundaram
    [J]. JOURNAL OF SUPERCOMPUTING, 2020, 76 (08): : 5745 - 5762
  • [26] A new improved filter-based feature selection model for high-dimensional data
    Deepak Raj Munirathinam
    Mohanasundaram Ranganadhan
    [J]. The Journal of Supercomputing, 2020, 76 : 5745 - 5762
  • [27] Benchmark for filter methods for feature selection in high-dimensional classification data
    Bommert, Andrea
    Sun, Xudong
    Bischl, Bernd
    Rahnenfuehrer, Joerg
    Lang, Michel
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2020, 143
  • [28] A novel feature learning framework for high-dimensional data classification
    Li, Yanxia
    Chai, Yi
    Yin, Hongpeng
    Chen, Bo
    [J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2021, 12 (02) : 555 - 569
  • [29] A novel feature learning framework for high-dimensional data classification
    Yanxia Li
    Yi Chai
    Hongpeng Yin
    Bo Chen
    [J]. International Journal of Machine Learning and Cybernetics, 2021, 12 : 555 - 569
  • [30] Feature selection for high-dimensional data
    Bolón-Canedo V.
    Sánchez-Maroño N.
    Alonso-Betanzos A.
    [J]. Progress in Artificial Intelligence, 2016, 5 (2) : 65 - 75