A New Expectation-Maximization Statistical Test for Case-Control Association Studies Considering Rare Variants Obtained by High-Throughput Sequencing

被引:5
|
作者
Gordon, Derek [1 ]
Finch, Stephen J. [2 ]
De La Vega, Francisco
机构
[1] Rutgers State Univ, Dept Genet, Piscataway, NJ USA
[2] SUNY Stony Brook, Dept Appl Math & Stat, Stony Brook, NY 11794 USA
关键词
Statistic; Genetics; Noncentrality parameter; Power; Misclassification; Sequence; Expectation-maximization; Multi-locus; CONTROL GENETIC ASSOCIATION; FALSE DISCOVERY RATE; GENOTYPE MISCLASSIFICATION; MISSING HERITABILITY; SAMPLE-SIZE; ERROR RATE; POWER; PHENOTYPE; HAPLOTYPE; DISEASES;
D O I
10.1159/000325590
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Genome-wide association studies (GWAS) have been successful in identifying common genetic variation reproducibly associated with disease. However, most associated variants confer very small risk and after meta-analysis of large cohorts a large fraction of expected heritability still remains unexplained. A possible explanation is that rare variants currently undetected by GWAS with SNP arrays could contribute a large fraction of risk when present in cases. This concept has spurred great interest in exploring the role of rare variants in disease. As the cost of sequencing continue to plummet, it is becoming feasible to directly sequence case-control samples for testing disease association including rare variants. We have developed a test statistic that allows for association testing among cases and controls using data directly from sequencing reads. In addition, our method allows for random errors in reads. We determine the probability of a true genotype call based on the observed base pair reads using the expectation-maximization algorithm. We apply the SumStat procedure to obtain a single statistic for a group of multiple rare variant loci. We document the validity of our method through simulations. Our results suggest that our statistic maintains the correct type I error rate, even in the presence of differential misclassification for sequence reads, and that it has good power under a number of scenarios. Finally, our SumStat results show power at least as good as the maximum single locus results. Copyright (C) 2011 S. Karger AG, Basel
引用
收藏
页码:113 / 125
页数:13
相关论文
共 20 条