Benchmark of filter methods for feature selection in high-dimensional gene expression survival data

被引:60
|
作者
Bommert, Andrea [1 ]
Welchowski, Thomas [2 ]
Schmid, Matthias [2 ]
Rahnenfuehrer, Joerg [1 ]
机构
[1] TU Dortmund Univ, Dept Stat, Vogelpothsweg 87, D-44227 Dortmund, Germany
[2] Univ Bonn, Med Fac, Inst Med Biometry Informat & Epidemiol IMBIE, Bonn, Germany
关键词
benchmark; feature selection; filter methods; high-dimensional data; survival analysis; MICROARRAY DATA; MUTUAL INFORMATION; ALGORITHMS; MODEL; CLASSIFICATION; REGULARIZATION;
D O I
10.1093/bib/bbab354
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Feature selection is crucial for the analysis of high-dimensional data, but benchmark studies for data with a survival outcome are rare. We compare 14 filter methods for feature selection based on 11 high-dimensional gene expression survival data sets. The aim is to provide guidance on the choice of filter methods for other researchers and practitioners. We analyze the accuracy of predictive models that employ the features selected by the filter methods. Also, we consider the run time, the number of selected features for fitting models with high predictive accuracy as well as the feature selection stability. We conclude that the simple variance filter outperforms all other considered filter methods. This filter selects the features with the largest variance and does not take into account the survival outcome. Also, we identify the correlation-adjusted regression scores filter as a more elaborate alternative that allows fitting models with similar predictive accuracy. Additionally, we investigate the filter methods based on feature rankings, finding groups of similar filters.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Benchmark for filter methods for feature selection in high-dimensional classification data
    Bommert, Andrea
    Sun, Xudong
    Bischl, Bernd
    Rahnenfuehrer, Joerg
    Lang, Michel
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2020, 143
  • [2] A filter feature selection for high-dimensional data
    Janane, Fatima Zahra
    Ouaderhman, Tayeb
    Chamlal, Hasna
    [J]. JOURNAL OF ALGORITHMS & COMPUTATIONAL TECHNOLOGY, 2023, 17
  • [3] Feature Selection in High-Dimensional Space with Applications to Gene Expression Data
    Pantha, Nishan
    Ramasubramanian, Muthukumaran
    Gurung, Iksha
    Maskey, Manil
    Sanders, Lauren M.
    Casaletto, James
    Costes, Sylvain V.
    [J]. SOUTHEASTCON 2024, 2024, : 6 - 15
  • [4] On the scalability of feature selection methods on high-dimensional data
    V. Bolón-Canedo
    D. Rego-Fernández
    D. Peteiro-Barral
    A. Alonso-Betanzos
    B. Guijarro-Berdiñas
    N. Sánchez-Maroño
    [J]. Knowledge and Information Systems, 2018, 56 : 395 - 442
  • [5] On the scalability of feature selection methods on high-dimensional data
    Bolon-Canedo, V.
    Rego-Fernandez, D.
    Peteiro-Barral, D.
    Alonso-Betanzos, A.
    Guijarro-Berdinas, B.
    Sanchez-Marono, N.
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2018, 56 (02) : 395 - 442
  • [6] Filter Feature Selection Performance Comparison in High-dimensional Data
    Huertas, Carlos
    Juarez-Ramirez, Reyes
    [J]. 2014 17TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION), 2014,
  • [7] FG-HFS: A feature filter and group evolution hybrid feature selection algorithm for high-dimensional gene expression data
    Xu, Zhaozhao
    Yang, Fangyuan
    Tang, Chaosheng
    Wang, Hong
    Wang, Shuihua
    Sun, Junding
    Zhang, Yudong
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 245
  • [8] Improved aquila optimizer with mRMR for feature selection of high-dimensional gene expression data
    Qin, Xiwen
    Zhang, Siqi
    Dong, Xiaogang
    Shi, Hongyu
    Yuan, Liping
    [J]. CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2024, 27 (09): : 13005 - 13027
  • [9] Feature selection for high-dimensional data
    Destrero A.
    Mosci S.
    De Mol C.
    Verri A.
    Odone F.
    [J]. Computational Management Science, 2009, 6 (1) : 25 - 40
  • [10] Feature selection for high-dimensional data
    Bolón-Canedo V.
    Sánchez-Maroño N.
    Alonso-Betanzos A.
    [J]. Progress in Artificial Intelligence, 2016, 5 (2) : 65 - 75