Gene and sample selection using T-score with sample selection

被引:12
|
作者
Mundra, Piyushkumar A. [1 ]
Rajapakse, Jagath C. [1 ,2 ,3 ]
机构
[1] Nanyang Technol Univ, Sch Comp Engn, Bioinformat Res Ctr, Singapore, Singapore
[2] Singapore MIT Alliance, Singapore, Singapore
[3] MIT, Dept Biol Engn, Cambridge, MA 02139 USA
关键词
Feature selection; Gene expression; Logistic regression; SVM-RFE; Approximate support vectors; CANCER CLASSIFICATION; MICROARRAY DATA; VARIABLE SELECTION; RANDOM FOREST; SVM-RFE; PREDICTION; FILTER; TUMOR;
D O I
10.1016/j.jbi.2015.11.003
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Gene selection from high-dimensional microarray gene-expression data is statistically a challenging problem. Filter approaches to gene selection have been popular because of their simplicity, efficiency, and accuracy. Due to small sample size, all samples are generally used to compute relevant ranking statistics and selection of samples in filter-based gene selection methods has not been addressed. In this paper, we extend previously-proposed simultaneous sample and gene selection approach. In a backward elimination method, a modified logistic regression loss function is used to select relevant samples at each iteration, and these samples are used to compute the T-score to rank genes. This method provides a compromise solution between T-score and other support vector machine (SVM) based algorithms. The performance is demonstrated on both simulated and real datasets with criteria such as classification performance, stability and redundancy. Results indicate that computational complexity and stability of the method are improved compared to SVM based methods without compromising the classification performance. (C) 2015 Elsevier Inc. All rights reserved.
引用
收藏
页码:31 / 41
页数:11
相关论文
共 50 条