A flexible approximate likelihood ratio test for detecting differential expression in microarray data

被引:2
|
作者
Hossain, Ahmed [1 ,2 ]
Beyene, Joseph [1 ,2 ]
Willan, Andrew R. [1 ,2 ]
Hu, Pingzhao [3 ]
机构
[1] SickKids Res Inst, Program Child Hlth Evaluat Sci, Biostat Methodol Unit, Toronto, ON M5G 1X8, Canada
[2] Univ Toronto, Dalla Lana Sch Publ Hlth, Toronto, ON M5T 3M7, Canada
[3] SickKids Res Inst, Program Genet & Genome Biol, Toronto, ON M5G 1X8, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
FALSE DISCOVERY RATE; GENE-EXPRESSION; CLASSIFICATION; MODELS;
D O I
10.1016/j.csda.2009.03.022
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Identifying differentially expressed genes in microarray data has been studied extensively and several methods have been proposed. Most popular methods in the study of gene expression microarray data analysis rely on normal distribution assumption and are based on a Wald statistic. These methods may be inefficient when expression levels follow a skewed distribution. To deal with possible violations of the normality assumption, we propose a method based on Generalized Logistic Distribution of Type II (GLDII). The motivation behind this distributional assumption is to allow longer tails than normal distribution. This is important in analyzing gene expression data since extreme values are common in such experiments. The shape parameter for GLDII allows flexibility in modeling a wide range of distributions. To simplify the computational complexity involved in carrying out Likelihood Ratio (LR) tests for several thousands of genes, an Approximate LR Test (ALRT) is proposed. We also generalize the two-class ALRT method to multi-class microarray data. The performance of the ALRT method under the GLDII assumption is compared to methods based on Wald-type statistics using simulation. The results from the simulations show that our method performs quite well compared to the significance analysis of microarrays (SAM) approach using standardized Wilcoxon rank statistics and the empirical Bayes (E-B) t-statistics. Our method is also less sensitive to extreme values. We illustrate our method using two publicly available gene expression data sets. (C) 2009 Elsevier B.V. All rights reserved.
引用
收藏
页码:3685 / 3695
页数:11
相关论文
共 50 条