Gene selection for sample classifications in microarray experiments

被引:22
|
作者
Tsai, CA
Chen, CH
Lee, TC
Ho, IC
Yang, UC
Chen, JJ
机构
[1] Natl Ctr Toxicol Res, FDA, Div Biometry & Risk Assessment, Jefferson, AR 72079 USA
[2] Acad Sinica, Inst Stat Sci, Taipei 115, Taiwan
[3] Acad Sinica, Dept Biomed Sci, Taipei 115, Taiwan
[4] Natl Yang Ming Univ, Inst Biopharmaceut Sci, Taipei 112, Taiwan
[5] Natl Yang Ming Univ, Inst Biochem, Taipei 112, Taiwan
关键词
D O I
10.1089/dna.2004.23.607
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
DNA microarray technology provides useful tools for profiling global gene expression patterns in different cell/tissue samples. One major challenge is the large number of genes relative to the number of samples. The use of all genes can suppress or reduce the performance of a classification rule due to the noise of nondiscriminatory genes. Selection of an optimal subset from the original gene set becomes an important prestep in sample classification. In this study, we propose a family-wise error (FWE) rate approach to selection of discriminatory genes for two-sample or multiple-sample classification. The FWE approach controls the probability of the number of one or more false positives at a prespecified level. A public colon cancer data set is used to evaluate the performance of the proposed approach for the two classification methods: k nearest neighbors (k-NN) and support vector machine (SVM). The selected gene sets from the proposed procedure appears to perform better than or comparable to several results reported in the literature using the univariate analysis without performing multivariate search. In addition, we apply the FWE approach to a toxicogenomic data set with nine treatments (a control and eight metals, As, Cd, Ni, Cr, Sb, Pb, Cu, and AsV) for a total of 55 samples for a multisample classification. Two gene sets are considered: the gene set Omega(F) formed by the ANOVA F-test, and a gene set Omega(T) formed by the union of one-versus-all t-tests. The predicted accuracies are evaluated using the internal and external crossvalidation. Using the SVM classification, the overall accuracies to predict 55 samples into one of the nine treatments are above 80% for internal crossvalidation. Omega(F) has slightly higher accuracy rates than Omega(T). The overall predicted accuracies are above 70% for the external crossvalidation; the two gene sets Omega(T) and delta(F) performed equally well.
引用
收藏
页码:607 / 614
页数:8
相关论文
共 50 条
  • [1] Sample size for gene expression microarray experiments
    Tsai, CA
    Wang, SJ
    Chen, DT
    Chen, JJ
    [J]. BIOINFORMATICS, 2005, 21 (08) : 1502 - 1508
  • [2] Sample selection for microarray gene expression studies
    Repsilber, D
    Fink, L
    Jacobsen, M
    Bläsing, O
    Ziegler, A
    [J]. METHODS OF INFORMATION IN MEDICINE, 2005, 44 (03) : 461 - 467
  • [3] Sample sizes for a robust ranking and selection of genes in microarray experiments
    Matsui, Shigeyuki
    Oura, Tomonori
    [J]. STATISTICS IN MEDICINE, 2009, 28 (22) : 2801 - 2816
  • [4] Sample size calculations based on ranking and selection in microarray experiments
    Matsui, Shigeyuki
    Zeng, Shu
    Yamanaka, Takeharu
    Shaughnessy, John
    [J]. BIOMETRICS, 2008, 64 (01) : 217 - 226
  • [5] Virtual gene: A gene selection algorithm for sample classification on microarray datasets
    Xu, X
    Zhang, AD
    [J]. COMPUTATIONAL SCIENCE - ICCS 2005, PT 2, 2005, 3515 : 1038 - 1045
  • [6] Gene Selection for Microarray Expression Data with Imbalanced Sample Distributions
    Kamal, Abu H. M.
    Zhu, Xingquan
    Narayanan, Ramaswamy
    [J]. 2009 INTERNATIONAL JOINT CONFERENCE ON BIOINFORMATICS, SYSTEMS BIOLOGY AND INTELLIGENT COMPUTING, PROCEEDINGS, 2009, : 3 - +
  • [7] Stable Gene Selection from Microarray Data via Sample Weighting
    Yu, Lei
    Han, Yue
    Berens, Michael E.
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2012, 9 (01) : 262 - 272
  • [8] A model-free greedy gene selection for microarray sample class prediction
    Shi, Yi
    Cai, Zhipeng
    Xu, Lizhe
    Ren, Wei
    Goebel, Randy
    Lin, Guohui
    [J]. PROCEEDINGS OF THE 2006 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2006, : 406 - +
  • [9] An Improved Method on Wilcoxon Rank Sum Test for Gene Selection from Microarray Experiments
    Hossain, Ahmed
    Willan, Andrew R.
    Beyene, Joseph
    [J]. COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2013, 42 (07) : 1563 - 1577
  • [10] Hierarchical Bayes variable selection and microarray experiments
    Nott, David J.
    Yu, Zeming
    Chan, Eva
    Cotsapas, Chris
    Cowley, Mark J.
    Pulvers, Jeremy
    Williams, Rohan
    Little, Peter
    [J]. JOURNAL OF MULTIVARIATE ANALYSIS, 2007, 98 (04) : 852 - 872