Fast Sampling-Based Whole-Genome Haplotype Block Recognition

被引:8
|
作者
Taliun, Daniel [1 ,2 ]
Gamper, Johann [2 ]
Leser, Ulf [3 ]
Pattaro, Cristian [1 ]
机构
[1] EURAC Res, Ctr Biomed, Bolzano, Italy
[2] Free Univ Bozen Bolzano, Fac Comp Sci, Bolzano, Italy
[3] Humboldt Univ, Inst Comp Sci, Berlin, Germany
关键词
SNP; linkage disequilibrium; haplotype blocks; SIMULTANEOUS CONFIDENCE-INTERVALS; DIMENSIONALITY REDUCTION; LINKAGE DISEQUILIBRIUM;
D O I
10.1109/TCBB.2015.2456897
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Scaling linkage disequilibrium (LD) based haplotype block recognition to the entire human genome has always been a challenge. The best-known algorithm has quadratic runtime complexity and, even when sophisticated search space pruning is applied, still requires several days of computations. Here, we propose a novel sampling-based algorithm, called S-MIG(++), where the main idea is to estimate the area that most likely contains all haplotype blocks by sampling a very small number of SNP pairs. A subsequent refinement step computes the exact blocks by considering only the SNP pairs within the estimated area. This approach significantly reduces the number of computed LD statistics, making the recognition of haplotype blocks very fast. We theoretically and empirically prove that the area containing all haplotype blocks can be estimated with a very high degree of certainty. Through experiments on the 243,080 SNPs on chromosome 20 from the 1,000 Genomes Project, we compared our previous algorithm MIG(++) with the new S-MIG(++) and observed a runtime reduction from 2.8 weeks to 34.8 hours. In a parallelized version of the S-MIG(++) algorithm using 32 parallel processes, the runtime was further reduced to 5.1 hours.
引用
收藏
页码:315 / 325
页数:11
相关论文
共 50 条
  • [1] Fast parallelized sampling of Bayesian regression models for whole-genome prediction
    Tianjing Zhao
    Rohan Fernando
    Dorian Garrick
    Hao Cheng
    [J]. Genetics Selection Evolution, 52
  • [2] Fast parallelized sampling of Bayesian regression models for whole-genome prediction
    Zhao, Tianjing
    Fernando, Rohan
    Garrick, Dorian
    Cheng, Hao
    [J]. GENETICS SELECTION EVOLUTION, 2020, 52 (01)
  • [3] HAPLOWSER: a whole-genome haplotype browser for personal genome and metagenome
    Kim, Jong Hyun
    Kim, Woo-Cheol
    Waterman, Michael S.
    Park, Sanghyun
    Li, Lei M.
    [J]. BIOINFORMATICS, 2009, 25 (18) : 2430 - 2431
  • [4] Whole-genome genotyping of haplotype tag single nucleotide polymorphisms
    Gunderson, KL
    Kuhn, KM
    Steemers, FJ
    Ng, P
    Murray, SS
    Shen, R
    [J]. PHARMACOGENOMICS, 2006, 7 (04) : 641 - 648
  • [5] GAMIBHEAR: whole-genome haplotype reconstruction from Genome Architecture Mapping data
    Markowski, Julia
    Kempfer, Rieke
    Kukalev, Alexander
    Irastorza-Azcarate, Ibai
    Loof, Gesa
    Kehr, Birte
    Pombo, Ana
    Rahmann, Sven
    Schwarz, Roland F.
    [J]. BIOINFORMATICS, 2021, 37 (19) : 3128 - 3135
  • [6] SamBaS: Sampling-Based Stochastic Block Partitioning
    Wanye, Frank
    Gleyzer, Vitaliy
    Kao, Edward
    Feng, Wu-chun
    [J]. IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, 2024, 11 (03): : 3053 - 3065
  • [7] Use of whole-genome variants and their frequency data to estimate haplotype structure in the Thoroughbred genome
    Tozaki, Teruaki
    Ohnuma, Aoi
    Kikuchi, Mio
    Ishige, Taichiro
    Kakoi, Hironaga
    Hirora, Kei-ichi
    Nagata, Shun-ichi
    [J]. ANIMAL GENETICS, 2023, 54 (05) : 662 - 663
  • [8] Exact algorithms for haplotype assembly from whole-genome sequence data
    Chen, Zhi-Zhong
    Deng, Fei
    Wang, Lusheng
    [J]. BIOINFORMATICS, 2013, 29 (16) : 1938 - 1945
  • [9] Whole-genome multilocus association mapping using localized haplotype clusters
    Browning, S. R.
    Browning, B. L.
    [J]. GENETIC EPIDEMIOLOGY, 2007, 31 (05) : 455 - 455
  • [10] Optimal algorithms for haplotype assembly from whole-genome sequence data
    He, Dan
    Choi, Arthur
    Pipatsrisawat, Knot
    Darwiche, Adnan
    Eskin, Eleazar
    [J]. BIOINFORMATICS, 2010, 26 (12) : i183 - i190