The γ-OMP Algorithm for Feature Selection With Application to Gene Expression Data

被引:3
|
作者
Tsagris, Michail [1 ]
Papadovasilakis, Zacharias [2 ]
Lakiotaki, Kleanthi [2 ]
Tsamardinos, Ioannis [2 ,3 ]
机构
[1] Univ Crete, Dept Econ, Rethimnon 74100, Greece
[2] Univ Crete, Dept Comp Sci, Iraklion 70013, Greece
[3] FORTH, Inst Appl & Computat Math, Iraklion 70013, Greece
基金
欧洲研究理事会;
关键词
Feature selection; high dimensional data; bioinformatics; omics data; gene expression data; LINEAR-MODELS; REGRESSION; LASSO;
D O I
10.1109/TCBB.2020.3029952
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Feature selection for predictive analytics is the problem of identifying a minimal-size subset of features that is maximally predictive of an outcome of interest. To apply to molecular data, feature selection algorithms need to be scalable to tens of thousands of features. In this paper, we propose gamma-OMP a generalisation of the highly-scalable Orthogonal Matching Pursuit feature selection algorithm. gamma-OMP can handle (a) various types of outcomes, such as continuous, binary, nominal, time-to-event, (b) discrete (categorical) features, (c) different statistical-based stopping criteria, (d) several predictive models (e.g., linear or logistic regression), (e) various types of residuals, and (f) different types of association. We compare gamma-OMP against LASSO, a prototypical, widely used algorithm for high-dimensional data. On both simulated data and several real gene expression datasets, gamma-OMP is on par, or outperforms LASSO in binary classification (case-control data), regression (quantified outcomes), and time-to-event data (censored survival times). gamma-OMP is based on simple statistical ideas, it is easy to implement and to extend, and our extensive evaluation shows that it is also effective in bioinformatics analysis settings.
引用
收藏
页码:1214 / 1224
页数:11
相关论文
共 50 条
  • [1] A hybrid feature selection algorithm for gene expression data classification
    Lu, Huijuan
    Chen, Junying
    Yan, Ke
    Jin, Qun
    Xue, Yu
    Gao, Zhigang
    [J]. NEUROCOMPUTING, 2017, 256 : 56 - 62
  • [2] Incremental forward feature selection with application to microarray gene expression data
    Lee, Yuh-Jye
    Chang, Chien-Chung
    Chao, Chia-Huang
    [J]. JOURNAL OF BIOPHARMACEUTICAL STATISTICS, 2008, 18 (05) : 827 - 840
  • [3] An Integrated Feature Selection Algorithm for Cancer Classification using Gene Expression Data
    Ahmed, Saeed
    Kabir, Muhammad
    Ali, Zakir
    Arif, Muhammad
    Ali, Farman
    Yu, Dong-Jun
    [J]. COMBINATORIAL CHEMISTRY & HIGH THROUGHPUT SCREENING, 2018, 21 (09) : 631 - 645
  • [4] A Top-r Feature Selection Algorithm for Microarray Gene Expression Data
    Sharma, Alok
    Imoto, Seiya
    Miyano, Satoru
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2012, 9 (03) : 754 - 764
  • [5] Application of Biological Domain Knowledge Based Feature Selection on Gene Expression Data
    Yousef, Malik
    Kumar, Abhishek
    Bakir-Gungor, Burcu
    [J]. ENTROPY, 2021, 23 (01) : 1 - 15
  • [6] Improving feature subset selection using a genetic algorithm for microarray gene expression data
    Tan, Feng
    Fu, Xuezheng
    Zhang, Yanqing
    Bourgeois, Anu G.
    [J]. 2006 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1-6, 2006, : 2514 - 2519
  • [7] Improved Binary Imperialist Competition Algorithm for Feature Selection from Gene Expression Data
    Aorigele
    Wang, Shuaiqun
    Tang, Zheng
    Gao, Shangce
    Todo, Yuki
    [J]. INTELLIGENT COMPUTING METHODOLOGIES, ICIC 2016, PT III, 2016, 9773 : 67 - 78
  • [8] CGUFS: A clustering-guided unsupervised feature selection algorithm for gene expression data
    Xu, Zhaozhao
    Yang, Fangyuan
    Wang, Hong
    Sun, Junding
    Zhu, Hengde
    Wang, Shuihua
    Zhang, Yudong
    [J]. JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2023, 35 (09)
  • [9] Gene expression data classification using genetic algorithm-based feature selection
    Sonmez, Oznur Sinem
    Dagtekin, Mustafa
    Ensari, Tolga
    [J]. TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2021, 29 (07) : 3165 - 3179
  • [10] Hybrid feature selection using micro genetic algorithm on microarray gene expression data
    Pragadeesh, C.
    Jeyaraj, Rohana
    Siranjeevi, K.
    Abishek, R.
    Jeyakumar, G.
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2019, 36 (03) : 2241 - 2246