Fast MCMC Sampling for Hidden Markov Models to Determine Copy Number Variations

被引:6
|
作者
Mahmud, Md Pavel [1 ]
Schliep, Alexander [1 ,2 ]
机构
[1] Rutgers State Univ, Dept Comp Sci, Piscataway, NJ 08854 USA
[2] Rutgers State Univ, BioMaPS Inst Quantitat Biol, Piscataway, NJ 08854 USA
来源
BMC BIOINFORMATICS | 2011年 / 12卷
关键词
CGH DATA; ARRAY; SEGMENTATION; DISTRIBUTIONS; ERROR;
D O I
10.1186/1471-2105-12-428
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Hidden Markov Models (HMM) are often used for analyzing Comparative Genomic Hybridization (CGH) data to identify chromosomal aberrations or copy number variations by segmenting observation sequences. For efficiency reasons the parameters of a HMM are often estimated with maximum likelihood and a segmentation is obtained with the Viterbi algorithm. This introduces considerable uncertainty in the segmentation, which can be avoided with Bayesian approaches integrating out parameters using Markov Chain Monte Carlo (MCMC) sampling. While the advantages of Bayesian approaches have been clearly demonstrated, the likelihood based approaches are still preferred in practice for their lower running times; datasets coming from high-density arrays and next generation sequencing amplify these problems. Results: We propose an approximate sampling technique, inspired by compression of discrete sequences in HMM computations and by kd-trees to leverage spatial relations between data points in typical data sets, to speed up the MCMC sampling. Conclusions: We test our approximate sampling method on simulated and biological ArrayCGH datasets and high-density SNP arrays, and demonstrate a speed-up of 10 to 60 respectively 90 while achieving competitive results with the state-of-the art Bayesian approaches. Availability: An implementation of our method will be made available as part of the open source GHMM library from http://ghmm.org.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] Fast MCMC sampling for hidden markov models to determine copy number variations
    Md Pavel Mahmud
    Alexander Schliep
    [J]. BMC Bioinformatics, 12
  • [2] Estimation of nonstationary hidden Markov models by MCMC sampling
    Djuric, PM
    Chun, JH
    [J]. ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 1737 - 1740
  • [3] Augmented Ensemble MCMC sampling in Factorial Hidden Markov Models
    Martens, Kaspar
    Titsias, Michalis K.
    Yau, Christopher
    [J]. 22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89
  • [4] An MCMC sampling approach to estimation of nonstationary hidden Markov models
    Djuric, PM
    Chun, JH
    [J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2002, 50 (05) : 1113 - 1123
  • [5] Variational Inference for Coupled Hidden Markov Models Applied to the Joint Detection of Copy Number Variations
    Wang, Xiaoqiang
    Lebarbier, Emilie
    Aubere, Julie
    Robin, Stephane
    [J]. INTERNATIONAL JOURNAL OF BIOSTATISTICS, 2019, 15 (01):
  • [6] Fast Bayesian Inference of Copy Number Variants using Hidden Markov Models with Wavelet Compression
    Wiedenhoeft, John
    Brugel, Eric
    Schliep, Alexander
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2016, 12 (05)
  • [7] Improved detection algorithm for copy number variations based on hidden Markov model
    Yang, Hai
    Zhu, Daming
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (13-14) : 9237 - 9253
  • [8] Improved detection algorithm for copy number variations based on hidden Markov model
    Hai Yang
    Daming Zhu
    [J]. Multimedia Tools and Applications, 2020, 79 : 9237 - 9253
  • [9] Fast MCMC Sampling for Markov Jump Processes and Extensions
    Rao, Vinayak
    Teh, Yee Whye
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2013, 14 : 3295 - 3320