A Group Feature Ranking and Selection Method Based on Dimension Reduction Technique in High-Dimensional Data

被引:5
|
作者
Zubair, Iqbal Muhammad [1 ]
Kim, Byunghoon [1 ]
机构
[1] Hanyang Univ, Dept Ind & Management Engn, Ansan, South Korea
来源
IEEE ACCESS | 2022年 / 10卷
基金
新加坡国家研究基金会;
关键词
Dimension reduction; feature extraction; group feature ranking; group feature selection; high dimensional data; CANCER; ROBUST; REGRESSION; ENSEMBLE; GENES;
D O I
10.1109/ACCESS.2022.3225685
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Group feature selection methods select the important group features by removing the irrelevant group features for reducing the complexity of the model. To the best of our knowledge, there are few group feature selection methods that provide the relative importance of each feature group. For this purpose, we developed a sparse group feature ranking method based on the dimension reduction technique for high dimensional data. Firstly, we applied relief to each group to remove irrelevant individual features. Secondly, we extract the new feature that represents each feature group. To this end, we reduce the multiple dimension of the group feature into a single dimension by applying Fisher linear discriminant analysis (FDA) for each feature group. At last, we estimate the relative importance of the extracted feature by applying random forest and selecting important features that have larger importance scores compared with other ones. In the end, machine-learning algorithms can be used to train and test the models. For the experiment, we compared the proposed with the supervised group lasso (SGL) method by using real-life high-dimensional datasets. Results show that the proposed method selects a few important group features just like the existing group feature selection method and provides the ranking and relative importance of all group features. SGL slightly performs better on logistic regression whereas the proposed method performs better on support vector machine, random forest, and gradient boosting in terms of classification performance metrics.
引用
下载
收藏
页码:125136 / 125147
页数:12
相关论文
共 50 条
  • [31] On the scalability of feature selection methods on high-dimensional data
    V. Bolón-Canedo
    D. Rego-Fernández
    D. Peteiro-Barral
    A. Alonso-Betanzos
    B. Guijarro-Berdiñas
    N. Sánchez-Maroño
    Knowledge and Information Systems, 2018, 56 : 395 - 442
  • [32] Simultaneous Feature and Model Selection for High-Dimensional Data
    Perolini, Alessandro
    Guerif, Sebastien
    2011 23RD IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2011), 2011, : 47 - 50
  • [33] Clustering high-dimensional data via feature selection
    Liu, Tianqi
    Lu, Yu
    Zhu, Biqing
    Zhao, Hongyu
    BIOMETRICS, 2023, 79 (02) : 940 - 950
  • [34] Simultaneous Feature Selection and Classification for High-Dimensional Data
    Pai, Vriddhi
    Gupta, Subhash Chand
    PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON GREEN COMPUTING AND INTERNET OF THINGS (ICGCIOT 2018), 2018, : 153 - 158
  • [35] High-Dimensional Software Engineering Data and Feature Selection
    Wang, Huanjing
    Khoshgoftaar, Taghi M.
    Gao, Kehan
    Seliya, Naeem
    ICTAI: 2009 21ST INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, 2009, : 83 - +
  • [36] Hybrid Feature Selection for High-Dimensional Manufacturing Data
    Sun, Yajuan
    Yu, Jianlin
    Li, Xiang
    Wu, Ji Yan
    Lu, Wen Feng
    2021 26TH IEEE INTERNATIONAL CONFERENCE ON EMERGING TECHNOLOGIES AND FACTORY AUTOMATION (ETFA), 2021,
  • [37] Feature Selection for High-Dimensional Data: The Issue of Stability
    Pes, Barbara
    2017 IEEE 26TH INTERNATIONAL CONFERENCE ON ENABLING TECHNOLOGIES - INFRASTRUCTURE FOR COLLABORATIVE ENTERPRISES (WETICE), 2017, : 170 - 175
  • [38] On the scalability of feature selection methods on high-dimensional data
    Bolon-Canedo, V.
    Rego-Fernandez, D.
    Peteiro-Barral, D.
    Alonso-Betanzos, A.
    Guijarro-Berdinas, B.
    Sanchez-Marono, N.
    KNOWLEDGE AND INFORMATION SYSTEMS, 2018, 56 (02) : 395 - 442
  • [39] A hybrid feature selection scheme for high-dimensional data
    Ganjei, Mohammad Ahmadi
    Boostani, Reza
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2022, 113
  • [40] Evaluating Feature Selection Robustness on High-Dimensional Data
    Pes, Barbara
    HYBRID ARTIFICIAL INTELLIGENT SYSTEMS (HAIS 2018), 2018, 10870 : 235 - 247