Fast and accurate haplotype frequency estimation for large haplotype vectors from pooled DNA data

被引:7
|
作者
Iliadis, Alexandros
Anastassiou, Dimitris
Wang, Xiaodong [1 ]
机构
[1] Columbia Univ, Ctr Computat Biol & Bioinformat, New York, NY 10027 USA
来源
BMC GENETICS | 2012年 / 13卷
关键词
LARGE-SCALE ASSOCIATION; LINKAGE-DISEQUILIBRIUM; POPULATION; IDENTIFICATION; INFORMATION; EFFICIENCY; INFERENCE; SCREEN; TOOL;
D O I
10.1186/1471-2156-13-94
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Background: Typically, the first phase of a genome wide association study (GWAS) includes genotyping across hundreds of individuals and validation of the most significant SNPs. Allelotyping of pooled genomic DNA is a common approach to reduce the overall cost of the study. Knowledge of haplotype structure can provide additional information to single locus analyses. Several methods have been proposed for estimating haplotype frequencies in a population from pooled DNA data. Results: We introduce a technique for haplotype frequency estimation in a population from pooled DNA samples focusing on datasets containing a small number of individuals per pool (2 or 3 individuals) and a large number of markers. We compare our method with the publicly available state-of-the-art algorithms HIPPO and HAPLOPOOL on datasets of varying number of pools and marker sizes. We demonstrate that our algorithm provides improvements in terms of accuracy and computational time over competing methods for large number of markers while demonstrating comparable performance for smaller marker sizes. Our method is implemented in the "Tree-Based Deterministic Sampling Pool" (TDSPool) package which is available for download at www.ee.columbia.edu/similar to anastas/tdspool. Conclusions: Using a tree-based determinstic sampling technique we present an algorithm for haplotype frequency estimation from pooled data. Our method demonstrates superior performance in datasets with large number of markers and could be the method of choice for haplotype frequency estimation in such datasets.
引用
下载
收藏
页数:10
相关论文
共 50 条
  • [41] Estimating haplotype frequencies in pooled DNA samples when there is genotyping error
    Shannon RE Quade
    Robert C Elston
    Katrina AB Goddard
    BMC Genetics, 6
  • [42] Estimating haplotype frequencies in pooled DNA samples when there is genotyping error
    Quade, SRE
    Elston, RC
    Goddard, KAB
    BMC GENETICS, 2005, 6 (1)
  • [43] The impact of genotyping error on haplotype reconstruction and frequency estimation
    Katherine M Kirk
    Lon R Cardon
    European Journal of Human Genetics, 2002, 10 : 616 - 622
  • [44] The impact of genotyping error on haplotype reconstruction and frequency estimation
    Kirk, KM
    Cardon, LR
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2002, 10 (10) : 616 - 622
  • [45] Faster haplotype frequency estimation using unrelated subjects
    Zhao, JH
    Sham, PC
    HUMAN HEREDITY, 2002, 53 (01) : 36 - 41
  • [46] GENERAL PROGRAM FOR ESTIMATION OF HAPLOTYPE FREQUENCIES FROM POPULATION DIPLOID DATA
    LARSEN, SO
    COMPUTER PROGRAMS IN BIOMEDICINE, 1979, 10 (01): : 48 - 54
  • [47] Haplotype estimation for biobank-scale data sets
    O'Connell, Jared
    Sharp, Kevin
    Shrine, Nick
    Wain, Louise
    Hall, Ian
    Tobin, Martin
    Zagury, Jean-Francois
    Delaneau, Olivier
    Marchini, Jonathan
    NATURE GENETICS, 2016, 48 (07) : 817 - +
  • [48] Estimating population haplotype frequencies from pooled SNP data using incomplete database information
    Pirinen, Matti
    BIOINFORMATICS, 2009, 25 (24) : 3296 - 3302
  • [49] CONSTRAINED LEAST-SQUARES ESTIMATION OF MIXED POPULATION STOCK COMPOSITION FROM MTDNA HAPLOTYPE FREQUENCY DATA
    XU, SZ
    KOBAK, CJ
    SMOUSE, PE
    CANADIAN JOURNAL OF FISHERIES AND AQUATIC SCIENCES, 1994, 51 (02) : 417 - 425
  • [50] PoooL: an efficient method for estimating haplotype frequencies from large DNA pools
    Zhang, Han
    Yang, Hsin-Chou
    Yang, Yaning
    BIOINFORMATICS, 2008, 24 (17) : 1942 - 1948