OnlineCall: fast online parameter estimation and base calling for illumina's next-generation sequencing

被引:19
|
作者
Das, Shreepriya [1 ]
Vikalo, Haris [1 ]
机构
[1] Univ Texas Austin, Elect & Comp Engn Dept, Austin, TX 78712 USA
基金
美国国家卫生研究院;
关键词
D O I
10.1093/bioinformatics/bts256
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Next-generation DNA sequencing platforms are becoming increasingly cost-effective and capable of providing enormous number of reads in a relatively short time. However, their accuracy and read lengths are still lagging behind those of conventional Sanger sequencing method. Performance of next-generation sequencing platforms is fundamentally limited by various imperfections in the sequencing-by-synthesis and signal acquisition processes. This drives the search for accurate, scalable and computationally tractable base calling algorithms capable of accounting for such imperfections. Results: Relying on a statistical model of the sequencing-by-synthesis process and signal acquisition procedure, we develop a computationally efficient base calling method for Illumina's sequencing technology (specifically, Genome Analyzer II platform). Parameters of the model are estimated via a fast unsupervised online learning scheme, which uses the generalized expectation-maximization algorithm and requires only 3 s of running time per tile (on an Intel i7 machine @3.07GHz, single core)-a three orders of magnitude speed-up over existing parametric model-based methods. To minimize the latency between the end of the sequencing run and the generation of the base calling reports, we develop a fast online scalable decoding algorithm, which requires only 9 s/tile and achieves significantly lower error rates than the Illumina's base calling software. Moreover, it is demonstrated that the proposed online parameter estimation scheme efficiently computes tile-dependent parameters, which can thereafter be provided to the base calling algorithm, resulting in significant improvements over previously developed base calling methods for the considered platform in terms of performance, time/complexity and latency.
引用
收藏
页码:1677 / 1683
页数:7
相关论文
共 50 条
  • [1] Base-calling for next-generation sequencing platforms
    Ledergerber, Christian
    Dessimoz, Christophe
    [J]. BRIEFINGS IN BIOINFORMATICS, 2011, 12 (05) : 489 - 497
  • [2] BASE CALLING ERROR RATES IN NEXT-GENERATION DNA SEQUENCING
    Shamaiah, Manohar
    Vikalo, Haris
    [J]. 2012 IEEE STATISTICAL SIGNAL PROCESSING WORKSHOP (SSP), 2012, : 692 - 695
  • [3] ParticleCall: A particle filter for base calling in next-generation sequencing systems
    Shen, Xiaohu
    Vikalo, Haris
    [J]. BMC BIOINFORMATICS, 2012, 13
  • [4] ParticleCall: A particle filter for base calling in next-generation sequencing systems
    Xiaohu Shen
    Haris Vikalo
    [J]. BMC Bioinformatics, 13
  • [5] Base-Calling Using a Random Effects Mixture Model on Next-Generation Sequencing Data
    Cacho, Ashley
    Yao, Weixin
    Cui, Xinping
    [J]. STATISTICS IN BIOSCIENCES, 2018, 10 (01) : 3 - 19
  • [6] A study on fast calling variants from next-generation sequencing data using decision tree
    Li, Zhentang
    Wang, Yi
    Wang, Fei
    [J]. BMC BIOINFORMATICS, 2018, 19
  • [7] A study on fast calling variants from next-generation sequencing data using decision tree
    Zhentang Li
    Yi Wang
    Fei Wang
    [J]. BMC Bioinformatics, 19
  • [8] Validation and assessment of variant calling pipelines for next-generation sequencing
    Pirooznia, Mehdi
    Kramer, Melissa
    Parla, Jennifer
    Goes, Fernando S.
    Potash, James B.
    McCombie, W. Richard
    Zandi, Peter P.
    [J]. HUMAN GENOMICS, 2014, 8 : 14
  • [9] Genotype and SNP calling from next-generation sequencing data
    Rasmus Nielsen
    Joshua S. Paul
    Anders Albrechtsen
    Yun S. Song
    [J]. Nature Reviews Genetics, 2011, 12 : 443 - 451
  • [10] Genotype and SNP calling from next-generation sequencing data
    Nielsen, Rasmus
    Paul, Joshua S.
    Albrechtsen, Anders
    Song, Yun S.
    [J]. NATURE REVIEWS GENETICS, 2011, 12 (06) : 443 - 451