OnlineCall: fast online parameter estimation and base calling for illumina's next-generation sequencing

被引:19
|
作者
Das, Shreepriya [1 ]
Vikalo, Haris [1 ]
机构
[1] Univ Texas Austin, Elect & Comp Engn Dept, Austin, TX 78712 USA
基金
美国国家卫生研究院;
关键词
D O I
10.1093/bioinformatics/bts256
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Next-generation DNA sequencing platforms are becoming increasingly cost-effective and capable of providing enormous number of reads in a relatively short time. However, their accuracy and read lengths are still lagging behind those of conventional Sanger sequencing method. Performance of next-generation sequencing platforms is fundamentally limited by various imperfections in the sequencing-by-synthesis and signal acquisition processes. This drives the search for accurate, scalable and computationally tractable base calling algorithms capable of accounting for such imperfections. Results: Relying on a statistical model of the sequencing-by-synthesis process and signal acquisition procedure, we develop a computationally efficient base calling method for Illumina's sequencing technology (specifically, Genome Analyzer II platform). Parameters of the model are estimated via a fast unsupervised online learning scheme, which uses the generalized expectation-maximization algorithm and requires only 3 s of running time per tile (on an Intel i7 machine @3.07GHz, single core)-a three orders of magnitude speed-up over existing parametric model-based methods. To minimize the latency between the end of the sequencing run and the generation of the base calling reports, we develop a fast online scalable decoding algorithm, which requires only 9 s/tile and achieves significantly lower error rates than the Illumina's base calling software. Moreover, it is demonstrated that the proposed online parameter estimation scheme efficiently computes tile-dependent parameters, which can thereafter be provided to the base calling algorithm, resulting in significant improvements over previously developed base calling methods for the considered platform in terms of performance, time/complexity and latency.
引用
下载
收藏
页码:1677 / 1683
页数:7
相关论文
共 50 条
  • [32] Software for pre-processing Illumina next-generation sequencing short read sequences
    Chen, Chuming
    Khaleel, Sari S.
    Huang, Hongzhan
    Wu, Cathy H.
    SOURCE CODE FOR BIOLOGY AND MEDICINE, 2014, 9 (01):
  • [33] Optimizing illumina next-generation sequencing library preparation for extremely at-biased genomes
    Oyola, Samuel O.
    Otto, Thomas D.
    Gu, Yong
    Maslen, Gareth
    Manske, Magnus
    Campino, Susana
    Turner, Daniel J.
    MacInnis, Bronwyn
    Kwiatkowski, Dominic P.
    Swerdlow, Harold P.
    Quail, Michael A.
    BMC GENOMICS, 2012, 13
  • [34] HLA genotyping using the Illumina HLA TruSight next-generation sequencing kits: A comparison
    Profaizer, T.
    Lazar-Molnar, E.
    Pole, A.
    Delgado, J. C.
    Kumanovics, A.
    INTERNATIONAL JOURNAL OF IMMUNOGENETICS, 2017, 44 (04) : 164 - 168
  • [35] Illumina receive first US FDA approval for clinical next-generation sequencing system
    Telfer, Caroline
    PHARMACOGENOMICS, 2014, 15 (02) : 130 - 131
  • [36] A Distributed System for Fast Alignment of Next-Generation Sequencing Data
    Srimani, Jaydeep K.
    Wu, Po-Yen
    Phan, John H.
    Wang, May D.
    2010 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE WORKSHOPS (BIBMW), 2010, : 579 - 584
  • [37] Extending Read Lengths on the Ion S5 Next-Generation Sequencing System to 600 Base Reads Substantially Improves HLA Typing by Next-Generation Sequencing
    Landes, M. A.
    Burgess, T.
    Duncan, C.
    Ghadiri, J.
    Gulati, A.
    Jung, A.
    Lincecum, T.
    Lowman, G.
    Linch, E.
    Mazur, D.
    Miller, L.
    Mozhayskiy, V.
    Ong, L.
    Peng, X.
    Shenasa, M.
    Thwar, P.
    Luo, G.
    JOURNAL OF MOLECULAR DIAGNOSTICS, 2016, 18 (06): : 966 - 966
  • [38] Optimized Next-Generation Sequencing Genotype-Haplotype Calling for Genome Variability Analysis
    Navarro, Javier
    Nevado, Bruno
    Hernandez, Porfidio
    Vera, Gonzalo
    Ramos-Onsins, Sebastian E.
    EVOLUTIONARY BIOINFORMATICS, 2017, 13
  • [39] Evaluating Variant Calling Tools for Non-Matched Next-Generation Sequencing Data
    Sarah Sandmann
    Aniek O. de Graaf
    Mohsen Karimi
    Bert A. van der Reijden
    Eva Hellström-Lindberg
    Joop H. Jansen
    Martin Dugas
    Scientific Reports, 7
  • [40] Optimized strategy for copy-number calling in targeted next-generation sequencing panels
    Vyverman, M.
    De Smet, R.
    Vinterhalter, G.
    Slabbinck, B.
    Deceukeleire, J.
    Bettens, K.
    Crappe, J.
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2020, 28 (SUPPL 1) : 648 - 649