Variational inference for rare variant detection in deep, heterogeneous next-generation sequencing data

被引:2
|
作者
Zhang, Fan [1 ]
Flaherty, Patrick [1 ,2 ]
机构
[1] Worcester Polytech Inst, Dept Biomed Engn, 100 Inst Rd, Worcester, MA 01609 USA
[2] Univ Massachusetts, Dept Math & Stat, 710 N Pleasant St, Amherst, MA 01003 USA
来源
BMC BIOINFORMATICS | 2017年 / 18卷
关键词
Single nucleotide variant detection; Next-generation sequencing; Bayesian statistical method; Variational inference; SOMATIC POINT MUTATIONS; CANCER; MODEL;
D O I
10.1186/s12859-016-1451-5
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The detection of rare single nucleotide variants (SNVs) is important for understanding genetic heterogeneity using next-generation sequencing (NGS) data. Various computational algorithms have been proposed to detect variants at the single nucleotide level in mixed samples. Yet, the noise inherent in the biological processes involved in NGS technology necessitates the development of statistically accurate methods to identify true rare variants. Results: We propose a Bayesian statistical model and a variational expectation maximization (EM) algorithm to estimate non-reference allele frequency (NRAF) and identify SNVs in heterogeneous cell populations. We demonstrate that our variational EM algorithm has comparable sensitivity and specificity compared with a Markov Chain Monte Carlo (MCMC) sampling inference algorithm, and is more computationally efficient on tests of relatively low coverage (27x and 298x) data. Furthermore, we show that our model with a variational EM inference algorithm has higher specificity than many state-of-the-art algorithms. In an analysis of a directed evolution longitudinal yeast data set, we are able to identify a time-series trend in non-reference allele frequency and detect novel variants that have not yet been reported. Our model also detects the emergence of a beneficial variant earlier than was previously shown, and a pair of concomitant variants. Conclusions: We developed a variational EM algorithm for a hierarchical Bayesian model to identify rare variants in heterogeneous next-generation sequencing data. Our algorithm is able to identify variants in a broad range of read depths and non-reference allele frequencies with high sensitivity and specificity.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Variational inference for rare variant detection in deep, heterogeneous next-generation sequencing data
    Fan Zhang
    Patrick Flaherty
    [J]. BMC Bioinformatics, 18
  • [2] Consensus Rules in Variant Detection from Next-Generation Sequencing Data
    Jia, Peilin
    Li, Fei
    Xia, Jufeng
    Chen, Haiquan
    Ji, Hongbin
    Pao, William
    Zhao, Zhongming
    [J]. PLOS ONE, 2012, 7 (06):
  • [3] SPARSE SIGNAL RECOVERY METHODS FOR VARIANT DETECTION IN NEXT-GENERATION SEQUENCING DATA
    Banuelos, Mario
    Almanza, Rubi
    Adhikari, Lasith
    Sindi, Suzanne
    Marcia, Roummel F.
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 864 - 868
  • [4] Rare Variant Association Testing for Next-Generation Sequencing Data via Hierarchical Clustering
    Tachmazidou, Ioanna
    Morris, Andrew
    Zeggini, Eleftheria
    [J]. HUMAN HEREDITY, 2012, 74 (3-4) : 165 - 171
  • [5] A Two-Dimensional Pooling Strategy for Rare Variant Detection on Next-Generation Sequencing Platforms
    Zuzarte, Philip C.
    Denroche, Robert E.
    Fehringer, Gordon
    Katzov-Eckert, Hagit
    Hung, Rayjean J.
    McPherson, John D.
    [J]. PLOS ONE, 2014, 9 (04):
  • [6] Variant Callers for Next-Generation Sequencing Data: A Comparison Study
    Liu, Xiangtao
    Han, Shizhong
    Wang, Zuoheng
    Gelernter, Joel
    Yang, Bao-Zhu
    [J]. PLOS ONE, 2013, 8 (09):
  • [7] Detection of ultra-rare mutations by next-generation sequencing
    Schmitt, Michael W.
    Kennedy, Scott R.
    Salk, Jesse J.
    Fox, Edward J.
    Hiatt, Joseph B.
    Loeb, Lawrence A.
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2012, 109 (36) : 14508 - 14513
  • [8] Detection of Rare and Subclonal Mutations by Next-Generation Sequencing.
    Jesse, Salk
    Valentine, C. C.
    Loeb, L. A.
    [J]. ENVIRONMENTAL AND MOLECULAR MUTAGENESIS, 2018, 59 : 56 - 56
  • [9] NGSNGS: next-generation simulator for next-generation sequencing data
    Henriksen, Rasmus Amund
    Zhao, Lei
    Korneliussen, Thorfinn Sand
    [J]. BIOINFORMATICS, 2023, 39 (01)
  • [10] Barcode-free next-generation sequencing error validation for ultra-rare variant detection
    Yeom, Huiran
    Lee, Yonghee
    Ryu, Taehoon
    Noh, Jinsung
    Lee, Amos Chungwon
    Lee, Han-Byoel
    Kang, Eunji
    Song, Seo Woo
    Kwon, Sunghoon
    [J]. NATURE COMMUNICATIONS, 2019, 10 (1)