Phasing of Many Thousands of Genotyped Samples

被引:81
|
作者
Wiliams, Amy L. [1 ,2 ]
Patterson, Nick [2 ]
Glessner, Joseph [3 ]
Hakonarson, Hakon [3 ]
Reich, David [1 ,2 ]
机构
[1] Harvard Univ, Sch Med, Dept Genet, Boston, MA 02115 USA
[2] Broad Inst Harvard & MIT, Cambridge, MA 02142 USA
[3] Childrens Hosp Philadelphia, Philadelphia, PA 19104 USA
基金
英国惠康基金; 美国国家卫生研究院;
关键词
GENOME-WIDE ASSOCIATION; LINKAGE DISEQUILIBRIUM; HAPLOTYPE INFERENCE; HUMAN-POPULATIONS; IMPUTATION; DISEASES; COMMON; LOCI;
D O I
10.1016/j.ajhg.2012.06.013
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Haplotypes are an important resource for a large number of applications in human genetics, but computationally inferred haplotypes are subject to switch errors that decrease their utility. The accuracy of computationally inferred haplotypes increases with sample size, and although ever larger genotypic data sets are being generated, the fact that existing methods require substantial computational resources limits their applicability to data sets containing tens or hundreds of thousands of samples. Here, we present HAPI-UR (haplotype inference for unrelated samples), an algorithm that is designed to handle unrelated and/or trio and duo family data, that has accuracy comparable to or greater than existing methods, and that is computationally efficient and can be applied to 100,000 samples or more. We use HAPI-UR to phase a data set with 58,207 samples and show that it achieves practical runtime and that switch errors decrease with sample size even with the use of samples from multiple ethnicities. Using a data set with 16,353 samples, we compare HAPI-UR to Beagle, MaCH, IMPUTE2, and SHAPEIT and show that HAPI-UR runs 18x faster than all methods and has a lower switch-error rate than do other methods except for Beagle; with the use of consensus phasing, running HAPI-UR three times gives a slightly lower switch-error rate than Beagle does and is more than six times faster. We demonstrate results similar to those from Beagle on another data set with a higher marker density. Lastly, we show that HAPI-UR has better runtime scaling properties than does Beagle so that for larger data sets, HAPI-UR will be practical and will have an even larger runtime advantage. HAPI-UR is available online (see Web Resources).
引用
收藏
页码:238 / 251
页数:14
相关论文
共 50 条
  • [31] Linkage disequilibrium and haplotype homozygosity in population samples genotyped at a high marker density
    Wang, Hui
    Lin, Chia-Ho
    Service, Susan
    Chen, Yuguo
    Freimer, Nelson
    Sabatti, Chiara
    HUMAN HEREDITY, 2006, 62 (04) : 175 - 189
  • [32] 2.7 million samples genotyped for HLA by next generation sequencing: lessons learned
    Gerhard Schöfl
    Kathrin Lang
    Philipp Quenzel
    Irina Böhme
    Jürgen Sauter
    Jan A. Hofmann
    Julia Pingel
    Alexander H. Schmidt
    Vinzenz Lange
    BMC Genomics, 18
  • [33] 2.7 million samples genotyped for HLA by next generation sequencing: lessons learned
    Schoefl, Gerhard
    Lang, Kathrin
    Quenzel, Philipp
    Boehme, Irina
    Sauter, Juergen
    Hofmann, Jan A.
    Pingel, Julia
    Schmidt, Alexander H.
    Lange, Vinzenz
    BMC GENOMICS, 2017, 18
  • [34] Comparison of results for whole genome association analysis of schizophrenia using pooled DNA samples and individually genotyped samples
    Raelson, John Verner
    Croteau, Pascal
    Perepetchai, Valeri
    Van Eerdewegh, Paul
    Allard, Rene
    Fournier, Helene
    Laplante, Nathalie
    Lapalme, Micheline
    Nguyen-Huu, Quyuh
    Paquin, Nouzha
    Paquin, Bruno
    Segal, Jonathan
    Vidal, Jean-Michel
    Keith, Tim
    Belouchi, Majid
    AMERICAN JOURNAL OF MEDICAL GENETICS PART B-NEUROPSYCHIATRIC GENETICS, 2006, 141B (07) : 718 - 718
  • [35] Many thousands gone: The first two centuries of slavery in North America.
    Edmonds, AO
    LIBRARY JOURNAL, 1998, 123 (15) : 90 - 90
  • [36] Many thousands gone: The first two centuries of slavery in North America.
    Hackney, S
    JOURNAL OF INTERDISCIPLINARY HISTORY, 1999, 30 (03) : 525 - 527
  • [37] THE MANY WAYS TO SHAKE SAMPLES
    May, Mike (mike@techtyper.com), 1600, LabX Media Group (12):
  • [38] Many thousands gone: The first two centuries of slavery in North America.
    Maxwell, K
    FOREIGN AFFAIRS, 1998, 77 (06) : 159 - 160
  • [39] Many thousands gone: The first two centuries of slavery in North America.
    Davis, DB
    AMERICAN HISTORICAL REVIEW, 1999, 104 (04): : 1286 - 1287