HECIL: A Hybrid Error Correction Algorithm for Long Reads with Iterative Learning

被引:0
|
作者
Olivia Choudhury
Ankush Chakrabarty
Scott J. Emrich
机构
[1] Postdoctoral Researcher,Visiting Research Scientist
[2] IBM Research,Associate Professor, Department of Electrical Engineering and Computer Science
[3] Mitsubishi Electric Research Laboratories,undefined
[4] University of Tennessee,undefined
来源
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Second-generation DNA sequencing techniques generate short reads that can result in fragmented genome assemblies. Third-generation sequencing platforms mitigate this limitation by producing longer reads that span across complex and repetitive regions. However, the usefulness of such long reads is limited because of high sequencing error rates. To exploit the full potential of these longer reads, it is imperative to correct the underlying errors. We propose HECIL—Hybrid Error Correction with Iterative Learning—a hybrid error correction framework that determines a correction policy for erroneous long reads, based on optimal combinations of decision weights obtained from short read alignments. We demonstrate that HECIL outperforms state-of-the-art error correction algorithms for an overwhelming majority of evaluation metrics on diverse, real-world data sets including E. coli, S. cerevisiae, and the malaria vector mosquito A. funestus. Additionally, we provide an optional avenue of improving the performance of HECIL’s core algorithm by introducing an iterative learning paradigm that enhances the correction policy at each iteration by incorporating knowledge gathered from previous iterations via data-driven confidence metrics assigned to prior corrections.
引用
收藏
相关论文
共 50 条
  • [1] HECIL: A Hybrid Error Correction Algorithm for Long Reads with Iterative Learning
    Choudhury, Olivia
    Chakrabarty, Ankush
    Emrich, Scott J.
    SCIENTIFIC REPORTS, 2018, 8
  • [2] A hybrid and scalable error correction algorithm for indel and substitution errors of long reads
    Arghya Kusum Das
    Sayan Goswami
    Kisung Lee
    Seung-Jong Park
    BMC Genomics, 20
  • [3] A hybrid and scalable error correction algorithm for indel and substitution errors of long reads
    Das, Arghya Kusum
    Goswami, Sayan
    Lee, Kisung
    Park, Seung-Jong
    BMC GENOMICS, 2019, 20 (Suppl 11)
  • [4] Jabba: hybrid error correction for long sequencing reads
    Giles Miclotte
    Mahdi Heydari
    Piet Demeester
    Stephane Rombauts
    Yves Van de Peer
    Pieter Audenaert
    Jan Fostier
    Algorithms for Molecular Biology, 11
  • [5] Jabba: hybrid error correction for long sequencing reads
    Miclotte, Giles
    Heydari, Mahdi
    Demeester, Piet
    Rombauts, Stephane
    Van de Peer, Yves
    Audenaert, Pieter
    Fostier, Jan
    ALGORITHMS FOR MOLECULAR BIOLOGY, 2016, 11
  • [6] Hercules: a profile HMM-based hybrid error correction algorithm for long reads
    Firtina, Can
    Bar-Joseph, Ziv
    Alkan, Can
    Cicek, A. Ercument
    NUCLEIC ACIDS RESEARCH, 2018, 46 (21) : e125
  • [7] Efficient Hybrid De Novo Error Correction and Assembly for Long Reads
    Kchouk, Mehdi
    Elloumi, Mourad
    2016 27TH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS (DEXA), 2016, : 88 - 92
  • [8] A comparative evaluation of hybrid error correction methods for error-prone long reads
    Fu, Shuhua
    Wang, Anqi
    Au, Kin Fai
    GENOME BIOLOGY, 2019, 20 (1)
  • [9] A comparative evaluation of hybrid error correction methods for error-prone long reads
    Shuhua Fu
    Anqi Wang
    Kin Fai Au
    Genome Biology, 20
  • [10] Iterative error correction of long sequencing reads maximizes accuracy and improves contig assembly
    Sameith, Katrin
    Roscito, Juliana G.
    Hiller, Michael
    BRIEFINGS IN BIOINFORMATICS, 2017, 18 (01) : 1 - 8