Fast and accurate haplotype frequency estimation for large haplotype vectors from pooled DNA data

被引:7
|
作者
Iliadis, Alexandros
Anastassiou, Dimitris
Wang, Xiaodong [1 ]
机构
[1] Columbia Univ, Ctr Computat Biol & Bioinformat, New York, NY 10027 USA
来源
BMC GENETICS | 2012年 / 13卷
关键词
LARGE-SCALE ASSOCIATION; LINKAGE-DISEQUILIBRIUM; POPULATION; IDENTIFICATION; INFORMATION; EFFICIENCY; INFERENCE; SCREEN; TOOL;
D O I
10.1186/1471-2156-13-94
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Background: Typically, the first phase of a genome wide association study (GWAS) includes genotyping across hundreds of individuals and validation of the most significant SNPs. Allelotyping of pooled genomic DNA is a common approach to reduce the overall cost of the study. Knowledge of haplotype structure can provide additional information to single locus analyses. Several methods have been proposed for estimating haplotype frequencies in a population from pooled DNA data. Results: We introduce a technique for haplotype frequency estimation in a population from pooled DNA samples focusing on datasets containing a small number of individuals per pool (2 or 3 individuals) and a large number of markers. We compare our method with the publicly available state-of-the-art algorithms HIPPO and HAPLOPOOL on datasets of varying number of pools and marker sizes. We demonstrate that our algorithm provides improvements in terms of accuracy and computational time over competing methods for large number of markers while demonstrating comparable performance for smaller marker sizes. Our method is implemented in the "Tree-Based Deterministic Sampling Pool" (TDSPool) package which is available for download at www.ee.columbia.edu/similar to anastas/tdspool. Conclusions: Using a tree-based determinstic sampling technique we present an algorithm for haplotype frequency estimation from pooled data. Our method demonstrates superior performance in datasets with large number of markers and could be the method of choice for haplotype frequency estimation in such datasets.
引用
下载
收藏
页数:10
相关论文
共 50 条
  • [21] Comparative validation of computer programs for haplotype frequency estimation from donor registry data
    Eberhard, H. -P.
    Madbouly, A. S.
    Gourraud, P. A.
    Balere, M. L.
    Feldmann, U.
    Gragert, L.
    Torres, H. Maldonado
    Pingel, J.
    Schmidt, A. H.
    Steiner, D.
    van der Zanden, H. G. M.
    Oudshoorn, M.
    Marsh, S. G. E.
    Maiers, M.
    Mueller, C. R.
    TISSUE ANTIGENS, 2013, 82 (02): : 93 - 105
  • [22] A fast collapsed data method for estimating haplotype frequencies from pooled genotype data with applications to the study of rare variants
    Kuk, Anthony Y. C.
    Li, Xiang
    Xu, Jinfeng
    STATISTICS IN MEDICINE, 2013, 32 (08) : 1343 - 1360
  • [23] Estimation of haplotype frequencies from diploid data.
    Abecasis, GR
    Martin, R
    Lewitzky, S
    AMERICAN JOURNAL OF HUMAN GENETICS, 2001, 69 (04) : 198 - 198
  • [24] Multiple haplotype reconstruction from allele frequency data
    Pelizzola, Marta
    Behr, Merle
    Li, Housen
    Munk, Axel
    Futschik, Andreas
    NATURE COMPUTATIONAL SCIENCE, 2021, 1 (04): : 262 - 271
  • [25] Multiple haplotype reconstruction from allele frequency data
    Marta Pelizzola
    Merle Behr
    Housen Li
    Axel Munk
    Andreas Futschik
    Nature Computational Science, 2021, 1 : 262 - 271
  • [26] Haplotype estimation from genotypical data by genetic algorithm
    Azuma R.
    Sakamoto M.
    Furutani H.
    Artificial Life and Robotics, 2009, 13 (2) : 535 - 537
  • [27] A FAST AND ACCURATE ALGORITHM FOR DIPLOID INDIVIDUAL HAPLOTYPE RECONSTRUCTION
    Wu, Jingli
    Liang, Binbin
    JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2013, 11 (04)
  • [28] Ranbow: A fast and accurate method for polyploid haplotype reconstruction
    Moeinzadeh, M-Hossein
    Yang, Jun
    Muzychenko, Evgeny
    Gallone, Giuseppe
    Heller, David
    Reinert, Knut
    Haas, Stefan
    Vingron, Martin
    PLOS COMPUTATIONAL BIOLOGY, 2020, 16 (05)
  • [29] Estimating Haplotype Frequencies by Combining Data from Large DNA Pools with Database Information
    Gasbarra, Dario
    Kulathinal, Sangita
    Pirinen, Matti
    Sillanpaa, Mikko J.
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2011, 8 (01) : 36 - 44
  • [30] Haplotype frequency estimation error analysis in the presence of missing genotype data
    Enda D Kelly
    Fabian Sievers
    Ross McManus
    BMC Bioinformatics, 5