Ensemble learning-based filter-centric hybrid feature selection framework for high-dimensional imbalanced data

被引:23
|
作者
Kim, Jongmo [1 ]
Kang, Jaewoong [1 ]
Sohn, Mye [1 ]
机构
[1] Sungkyunkwan Univ, Dept Ind Engn, Suwon, South Korea
基金
新加坡国家研究基金会;
关键词
Hybrid feature selection; Ensemble feature selection; Multiple classifiers; Robust feature subset; High-dimensional imbalanced data; DIVERSITY MEASURES; DATA-SETS; ROBUST;
D O I
10.1016/j.knosys.2021.106901
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, research on feature selection for high-dimensional imbalanced data has attracted a considerable amount of attention. The filter-wrapper hybrid method, which is a conventional method of feature selection for high-dimensional data, aims to reduce excessive computational time. On the other hand, ensemble learning-based feature selection, even though it has a high level of computational complexity, focuses exclusively on the discovery of robust features. From this perspective, combining these two feature selection methods is not easy. However, a combined method is essential to advancing machine learning research that addresses real-world problems. We propose an filter-centric hybrid method based on ensemble-learning that can select the best feature subset for high-dimensional imbalanced data. The basic concept of the proposed method is to design a feature evaluation scheme based on the filter method and to apply ensemble learning with reasonable computational time. To achieve this objective, our innovative method utilizes predictions produced by multiple classifiers as inputs of the feature evaluation function. As a result, it can reflect the predictive performance of the classifiers and overcome the low performance of selected features by filter methods. In addition, it can find robust features simultaneously. To demonstrate the superiority of the proposed method, we perform various experiments using 14 experimental datasets that consist of low-dimensional balanced, high-dimensional balanced, and high-dimensional imbalanced datasets. Finally, we compare the proposed method with state-of-the-art feature selection methods. (c) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页数:25
相关论文
共 50 条
  • [1] Feature selection for high-dimensional imbalanced data
    Yin, Liuzhi
    Ge, Yong
    Xiao, Keli
    Wang, Xuehua
    Quan, Xiaojun
    [J]. NEUROCOMPUTING, 2013, 105 : 3 - 11
  • [2] Feature Selection with High-Dimensional Imbalanced Data
    Van Hulse, Jason
    Khoshgoftaar, Taghi M.
    Napolitano, Amri
    Wald, Randall
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2009), 2009, : 507 - 514
  • [3] A hybrid feature selection approach based on ensemble method for high-dimensional data
    Rouhi, Amirreza
    Nezamabadi-pour, Hossein
    [J]. 2017 2ND CONFERENCE ON SWARM INTELLIGENCE AND EVOLUTIONARY COMPUTATION (CSIEC), 2017, : 16 - 20
  • [4] Distributed Ensemble Feature Selection Framework for High-Dimensional and High-Skewed Imbalanced Big Dataset
    Soheili, Majid
    Haeri, Maryam Amir Amir
    [J]. 2021 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2021), 2021,
  • [5] A filter feature selection for high-dimensional data
    Janane, Fatima Zahra
    Ouaderhman, Tayeb
    Chamlal, Hasna
    [J]. JOURNAL OF ALGORITHMS & COMPUTATIONAL TECHNOLOGY, 2023, 17
  • [6] A hybrid feature weighting and selection-based strategy to classify the high-dimensional and imbalanced medical data
    Harpreet Singh
    Manpreet Kaur
    Birmohan Singh
    [J]. Neural Computing and Applications, 2024, 36 (20) : 12299 - 12316
  • [7] A Hybrid Ensemble Feature Selection-Based Learning Model for COPD Prediction on High-Dimensional Feature Space
    Banda, Srinivas Raja Banda
    Babu, Tummala Ranga
    [J]. DATA ENGINEERING AND COMMUNICATION TECHNOLOGY, ICDECT-2K19, 2020, 1079 : 663 - 675
  • [8] A GA-BASED FEATURE SELECTION AND ENSEMBLE LEARNING FOR HIGH-DIMENSIONAL DATASETS
    Xia, Pei-Yong
    Ding, Xiang-Qian
    Jiang, Bai-Ning
    [J]. PROCEEDINGS OF 2009 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-6, 2009, : 7 - +
  • [9] An ensemble feature selection method for high-dimensional data based on sort aggregation
    Wang, Jie
    Xu, Jing
    Zhao, Chengan
    Peng, Yan
    Wang, Hongpeng
    [J]. SYSTEMS SCIENCE & CONTROL ENGINEERING, 2019, 7 (02) : 32 - 39
  • [10] Online feature selection for high-dimensional class-imbalanced data
    Zhou, Peng
    Hu, Xuegang
    Li, Peipei
    Wu, Xindong
    [J]. KNOWLEDGE-BASED SYSTEMS, 2017, 136 : 187 - 199