Benchmark of filter methods for feature selection in high-dimensional gene expression survival data

被引:60
|
作者
Bommert, Andrea [1 ]
Welchowski, Thomas [2 ]
Schmid, Matthias [2 ]
Rahnenfuehrer, Joerg [1 ]
机构
[1] TU Dortmund Univ, Dept Stat, Vogelpothsweg 87, D-44227 Dortmund, Germany
[2] Univ Bonn, Med Fac, Inst Med Biometry Informat & Epidemiol IMBIE, Bonn, Germany
关键词
benchmark; feature selection; filter methods; high-dimensional data; survival analysis; MICROARRAY DATA; MUTUAL INFORMATION; ALGORITHMS; MODEL; CLASSIFICATION; REGULARIZATION;
D O I
10.1093/bib/bbab354
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Feature selection is crucial for the analysis of high-dimensional data, but benchmark studies for data with a survival outcome are rare. We compare 14 filter methods for feature selection based on 11 high-dimensional gene expression survival data sets. The aim is to provide guidance on the choice of filter methods for other researchers and practitioners. We analyze the accuracy of predictive models that employ the features selected by the filter methods. Also, we consider the run time, the number of selected features for fitting models with high predictive accuracy as well as the feature selection stability. We conclude that the simple variance filter outperforms all other considered filter methods. This filter selects the features with the largest variance and does not take into account the survival outcome. Also, we identify the correlation-adjusted regression scores filter as a more elaborate alternative that allows fitting models with similar predictive accuracy. Additionally, we investigate the filter methods based on feature rankings, finding groups of similar filters.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Robust feature screening for high-dimensional survival data
    Hao, Meiling
    Lin, Yuanyuan
    Liu, Xianhui
    Tang, Wenlu
    [J]. JOURNAL OF APPLIED STATISTICS, 2019, 46 (06) : 979 - 994
  • [42] Survival analysis for high-dimensional, heterogeneous medical data: Exploring feature extraction as an alternative to feature selection
    Poelsterl, Sebastian
    Conjeti, Sailesh
    Navab, Nassir
    Katouzian, Amin
    [J]. ARTIFICIAL INTELLIGENCE IN MEDICINE, 2016, 72 : 1 - 11
  • [43] Unbiased Prediction and Feature Selection in High-Dimensional Survival Regression
    Laimighofer, Michael
    Krumsiek, Jan
    Buettner, Florian
    Theis, Fabian J.
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2016, 23 (04) : 279 - 290
  • [44] Comparison of the Cluster Validation Methods for High-dimensional (Gene Expression) Data
    Jeong, Yunkyoung
    Baek, Jangsun
    [J]. KOREAN JOURNAL OF APPLIED STATISTICS, 2007, 20 (01) : 167 - 181
  • [45] Optimal Bayesian Feature Selection on High Dimensional Gene Expression Data
    Pour, Ali Foroughi
    Dalton, Lori A.
    [J]. 2014 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP), 2014, : 1402 - 1405
  • [46] TAGA: Tabu Asexual Genetic Algorithm embedded in a filter/filter feature selection approach for high-dimensional data
    Salesi, Sadegh
    Cosma, Georgina
    Mavrovouniotis, Michalis
    [J]. INFORMATION SCIENCES, 2021, 565 : 105 - 127
  • [47] Feature selection for high-dimensional data: A Kolmogorov-Smirnov correlation-based filter
    Biesiada, J
    Duch, W
    [J]. COMPUTER RECOGNITION SYSTEMS, PROCEEDINGS, 2005, : 95 - 103
  • [48] A Variable Selection Method for High-Dimensional Survival Data
    Giordano, Francesco
    Milito, Sara
    Restaino, Marialuisa
    [J]. MATHEMATICAL AND STATISTICAL METHODS FOR ACTUARIAL SCIENCES AND FINANCE, MAF 2022, 2022, : 303 - 308
  • [49] An efficient multivariate feature ranking method for gene selection in high-dimensional microarray data
    Lee, Junghye
    Choi, In Young
    Jun, Chi-Hyuck
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2021, 166
  • [50] A Light Causal Feature Selection Approach to High-Dimensional Data
    Ling, Zhaolong
    Li, Ying
    Zhang, Yiwen
    Yu, Kui
    Zhou, Peng
    Li, Bo
    Wu, Xindong
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (08) : 7639 - 7650