HECIL: A Hybrid Error Correction Algorithm for Long Reads with Iterative Learning

被引:0
|
作者
Olivia Choudhury
Ankush Chakrabarty
Scott J. Emrich
机构
[1] Postdoctoral Researcher,Visiting Research Scientist
[2] IBM Research,Associate Professor, Department of Electrical Engineering and Computer Science
[3] Mitsubishi Electric Research Laboratories,undefined
[4] University of Tennessee,undefined
来源
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Second-generation DNA sequencing techniques generate short reads that can result in fragmented genome assemblies. Third-generation sequencing platforms mitigate this limitation by producing longer reads that span across complex and repetitive regions. However, the usefulness of such long reads is limited because of high sequencing error rates. To exploit the full potential of these longer reads, it is imperative to correct the underlying errors. We propose HECIL—Hybrid Error Correction with Iterative Learning—a hybrid error correction framework that determines a correction policy for erroneous long reads, based on optimal combinations of decision weights obtained from short read alignments. We demonstrate that HECIL outperforms state-of-the-art error correction algorithms for an overwhelming majority of evaluation metrics on diverse, real-world data sets including E. coli, S. cerevisiae, and the malaria vector mosquito A. funestus. Additionally, we provide an optional avenue of improving the performance of HECIL’s core algorithm by introducing an iterative learning paradigm that enhances the correction policy at each iteration by incorporating knowledge gathered from previous iterations via data-driven confidence metrics assigned to prior corrections.
引用
收藏
相关论文
共 50 条
  • [21] Bi-Level Error Correction for PacBio Long Reads
    Liu, Yuansheng
    Lan, Chaowang
    Blumenstein, Michael
    Li, Jinyan
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2020, 17 (03) : 899 - 905
  • [22] A Long read hybrid error correction algorithm based on segmented pHMM
    Hu Lanyue
    Chen Jianhua
    Wang Rongshu
    Lu Zhiwen
    Hou Bin
    2020 5TH INTERNATIONAL CONFERENCE ON MECHANICAL, CONTROL AND COMPUTER ENGINEERING (ICMCCE 2020), 2020, : 1501 - 1504
  • [23] HYBRIDSPADES: an algorithm for hybrid assembly of short and long reads
    Antipov, Dmitry
    Korobeynikov, Anton
    McLean, Jeffrey S.
    Pevzner, Pavel A.
    BIOINFORMATICS, 2016, 32 (07) : 1009 - 1015
  • [24] NoDe: a fast error-correction algorithm for pyrosequencing amplicon reads
    Mysara, Mohamed
    Leys, Natalie
    Raes, Jeroen
    Monsieurs, Pieter
    BMC BIOINFORMATICS, 2015, 16
  • [25] NoDe: a fast error-correction algorithm for pyrosequencing amplicon reads
    Mohamed Mysara
    Natalie Leys
    Jeroen Raes
    Pieter Monsieurs
    BMC Bioinformatics, 16
  • [26] DeepCorr: a novel error correction method for 3GS long reads based on deep learning
    Wang, Rongshu
    Chen, Jianhua
    PeerJ Computer Science, 2024, 10
  • [27] DeepCorr: a novel error correction method for 3GS long reads based on deep learning
    Wang, Rongshu
    Chen, Jianhua
    PEERJ COMPUTER SCIENCE, 2024, 10
  • [28] Performance difference of graph-based and alignment-based hybrid error correction methods for error-prone long reads
    Wang, Anqi
    Au, Kin Fai
    GENOME BIOLOGY, 2020, 21 (01)
  • [29] Performance difference of graph-based and alignment-based hybrid error correction methods for error-prone long reads
    Anqi Wang
    Kin Fai Au
    Genome Biology, 21
  • [30] NextDenovo: an efficient error correction and accurate assembly tool for noisy long reads
    Hu, Jiang
    Wang, Zhuo
    Sun, Zongyi
    Hu, Benxia
    Ayoola, Adeola Oluwakemi
    Liang, Fan
    Li, Jingjing
    Sandoval, Jose R.
    Cooper, David N.
    Ye, Kai
    Ruan, Jue
    Xiao, Chuan-Le
    Wang, Depeng
    Wu, Dong-Dong
    Wang, Sheng
    GENOME BIOLOGY, 2024, 25 (01)