Ensemble feature selection for high dimensional data: a new method and a comparative study

被引:70
|
作者
Ben Brahim, Afef [1 ]
Limam, Mohamed [2 ]
机构
[1] Univ Tunis, Tunis Business Sch, LARODEC, BP 65, Bir El Kassaa 2059, Tunisia
[2] Dhofar Univ, Salalah, Oman
关键词
Feature selection; Ensemble methods; Classification; Stability; High dimensionality; MICROARRAY DATA; CANCER; AGGREGATION; PREDICTION;
D O I
10.1007/s11634-017-0285-y
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
The curse of dimensionality is based on the fact that high dimensional data is often difficult to work with. A large number of features can increase the noise of the data and thus the error of a learning algorithm. Feature selection is a solution for such problems where there is a need to reduce the data dimensionality. Different feature selection algorithms may yield feature subsets that can be considered local optima in the space of feature subsets. Ensemble feature selection combines independent feature subsets and might give a better approximation to the optimal subset of features. We propose an ensemble feature selection approach based on feature selectors' reliability assessment. It aims at providing a unique and stable feature selection without ignoring the predictive accuracy aspect. A classification algorithm is used as an evaluator to assign a confidence to features selected by ensemble members based on their associated classification performance. We compare our proposed approach to several existing techniques and to individual feature selection algorithms. Results show that our approach often improves classification performance and feature selection stability for high dimensional data sets.
引用
收藏
页码:937 / 952
页数:16
相关论文
共 50 条
  • [1] Ensemble feature selection for high dimensional data: a new method and a comparative study
    Afef Ben Brahim
    Mohamed Limam
    [J]. Advances in Data Analysis and Classification, 2018, 12 : 937 - 952
  • [2] An ensemble feature selection method for high-dimensional data based on sort aggregation
    Wang, Jie
    Xu, Jing
    Zhao, Chengan
    Peng, Yan
    Wang, Hongpeng
    [J]. SYSTEMS SCIENCE & CONTROL ENGINEERING, 2019, 7 (02) : 32 - 39
  • [3] A hybrid feature selection approach based on ensemble method for high-dimensional data
    Rouhi, Amirreza
    Nezamabadi-pour, Hossein
    [J]. 2017 2ND CONFERENCE ON SWARM INTELLIGENCE AND EVOLUTIONARY COMPUTATION (CSIEC), 2017, : 16 - 20
  • [4] Stability Investigation of Ensemble Feature Selection for High Dimensional Data Analytics
    Sumant, Archana Shivdas
    Patil, Dipak
    [J]. THIRD INTERNATIONAL CONFERENCE ON IMAGE PROCESSING AND CAPSULE NETWORKS (ICIPCN 2022), 2022, 514 : 801 - 815
  • [5] A New Ensemble Method with Feature Space Partitioning for High-Dimensional Data Classification
    Piao, Yongjun
    Piao, Minghao
    Jin, Cheng Hao
    Shon, Ho Sun
    Chung, Ji-Moon
    Hwang, Buhyun
    Ryu, Keun Ho
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2015, 2015
  • [6] New heuristics in feature selection for high dimensional data
    Ruiz, Roberto
    [J]. AI COMMUNICATIONS, 2007, 20 (02) : 129 - 131
  • [7] Prediction of Skin Disease Using Ensemble Data Mining Techniques and Feature Selection Method—a Comparative Study
    Anurag Kumar Verma
    Saurabh Pal
    Surjeet Kumar
    [J]. Applied Biochemistry and Biotechnology, 2020, 190 : 341 - 359
  • [8] Stratified feature sampling method for ensemble clustering of high dimensional data
    Jing, Liping
    Tian, Kuang
    Huang, Joshua Z.
    [J]. PATTERN RECOGNITION, 2015, 48 (11) : 3688 - 3702
  • [9] Exploiting the ensemble paradigm for stable feature selection: A case study on high-dimensional genomic data
    Pes, Barbara
    Dessi, Nicoletta
    Angioni, Marta
    [J]. INFORMATION FUSION, 2017, 35 : 132 - 147
  • [10] A hybrid feature selection method for high-dimensional data
    Taheri, Nooshin
    Nezamabadi-pour, Hossein
    [J]. 2014 4TH INTERNATIONAL CONFERENCE ON COMPUTER AND KNOWLEDGE ENGINEERING (ICCKE), 2014, : 141 - 145