Benchmarking of computational error-correction methods for next-generation sequencing data

被引:0
|
作者
Mitchell, Keith [1 ]
Brito, Jaqueline J. [2 ]
Mandric, Igor [1 ,3 ]
Wu, Qiaozhen [1 ]
Knyazev, Sergey [3 ]
Chang, Sei [1 ]
Martin, Lana S. [2 ]
Karlsberg, Aaron [2 ]
Gerasimov, Ekaterina [3 ]
Littman, Russell [1 ]
Hill, Brian L. [1 ]
Wu, Nicholas C. [4 ]
Yang, Harry [1 ]
Hsieh, Kevin [1 ]
Chen, Linus [1 ]
Littman, Eli [1 ]
Shabani, Taylor [1 ]
Enik, German [1 ]
Yao, Douglas [1 ]
Sun, Ren [1 ]
Schroeder, Jan [5 ]
Eskin, Eleazar [1 ]
Zelikovsky, Alex [6 ,7 ]
Skums, Pavel [3 ]
Pop, Mihai [8 ]
Mangul, Serghei [1 ,2 ]
机构
[1] Univ Calif Los Angeles, Los Angeles, CA 90024 USA
[2] USC, Los Angeles, CA USA
[3] Georgia State Univ, Atlanta, GA 30303 USA
[4] Scripps Res Inst, La Jolla, CA USA
[5] Monash Univ, Clayton, Vic, Australia
[6] GSU, Atlanta, GA USA
[7] MSMU, Moscow, Russia
[8] Univ Maryland, Baltimore, MD USA
关键词
D O I
10.1145/3388440.3414209
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Recent advancements in next-generation sequencing have rapidly improved our ability to study genomic material at an unprecedented scale. Despite substantial improvements in sequencing technologies, errors present in the data still risk confounding downstream analysis and limiting the applicability of sequencing technologies in clinical tools. Computational error-correction promises to eliminate sequencing errors, but the relative accuracy of error correction algorithms remains unknown. In this paper, we evaluate error-correction algorithms' ability to fix errors across different types of datasets that contain various levels of heterogeneity. We perform a realistic evaluation of several error correction tools. To measure the efficacy of these techniques, we apply the UMI-based high-fidelity sequencing protocol to eliminate sequencing errors from both simulated data and the raw reads. In terms of accuracy, we find that method performance varies substantially across different types of datasets with no single method performing best on all types of examined data. We also identify the techniques that offer a good balance between precision and sensitivity. This highlight showcases our paper's main findings [1], showing the advantages and limitations of computational error correction techniques across different domains of biology, including immunogenomics and virology.
引用
收藏
页数:1
相关论文
共 50 条
  • [1] Benchmarking of computational error-correction methods for next-generation sequencing data
    Mitchell, Keith
    Brito, Jaqueline J.
    Mandric, Igor
    Wu, Qiaozhen
    Knyazev, Sergey
    Chang, Sei
    Martin, Lana S.
    Karlsberg, Aaron
    Gerasimov, Ekaterina
    Littman, Russell
    Hill, Brian L.
    Wu, Nicholas C.
    Yang, Harry Taegyun
    Hsieh, Kevin
    Chen, Linus
    Littman, Eli
    Shabani, Taylor
    Enik, German
    Yao, Douglas
    Sun, Ren
    Schroeder, Jan
    Eskin, Eleazar
    Zelikovsky, Alex
    Skums, Pavel
    Pop, Mihai
    Mangul, Serghei
    [J]. GENOME BIOLOGY, 2020, 21 (01)
  • [2] Benchmarking of computational error-correction methods for next-generation sequencing data
    Keith Mitchell
    Jaqueline J. Brito
    Igor Mandric
    Qiaozhen Wu
    Sergey Knyazev
    Sei Chang
    Lana S. Martin
    Aaron Karlsberg
    Ekaterina Gerasimov
    Russell Littman
    Brian L. Hill
    Nicholas C. Wu
    Harry Taegyun Yang
    Kevin Hsieh
    Linus Chen
    Eli Littman
    Taylor Shabani
    German Enik
    Douglas Yao
    Ren Sun
    Jan Schroeder
    Eleazar Eskin
    Alex Zelikovsky
    Pavel Skums
    Mihai Pop
    Serghei Mangul
    [J]. Genome Biology, 21
  • [3] A survey of error-correction methods for next-generation sequencing
    Yang, Xiao
    Chockalingam, Sriram P.
    Aluru, Srinivas
    [J]. BRIEFINGS IN BIOINFORMATICS, 2013, 14 (01) : 56 - 66
  • [4] Effects of error-correction of heterozygous next-generation sequencing data
    Fujimoto, M. Stanley
    Bodily, Paul M.
    Okuda, Nozomu
    Clement, Mark J.
    Snell, Quinn
    [J]. BMC BIOINFORMATICS, 2014, 15
  • [5] Effects of error-correction of heterozygous next-generation sequencing data
    M Stanley Fujimoto
    Paul M Bodily
    Nozomu Okuda
    Mark J Clement
    Quinn Snell
    [J]. BMC Bioinformatics, 15
  • [6] MapReduce for accurate error correction of next-generation sequencing data
    Zhao, Liang
    Chen, Qingfeng
    Li, Wencui
    Jiang, Peng
    Wong, Limsoon
    Li, Jinyan
    [J]. BIOINFORMATICS, 2017, 33 (23) : 3844 - 3851
  • [7] PAGANtec: OpenMP Parallel Error Correction for Next-Generation Sequencing Data
    Joppich, Markus
    Schmidl, Dirk
    Bolger, Anthony M.
    Kuhlen, Torsten
    Usadel, Bjoern
    [J]. OPENMP: HETEROGENOUS EXECUTION AND DATA MOVEMENTS, IWOMP 2015, 2015, 9342 : 3 - 17
  • [8] Error correction of next-generation sequencing data and reliable estimation of HIV quasispecies
    Zagordi, Osvaldo
    Klein, Rolf
    Daeumer, Martin
    Beerenwinkel, Niko
    [J]. NUCLEIC ACIDS RESEARCH, 2010, 38 (21) : 7400 - 7409
  • [9] Human Retrotransposons and Effective Computational Detection Methods for Next-Generation Sequencing Data
    Lee, Haeun
    Min, Jun Won
    Mun, Seyoung
    Han, Kyudong
    [J]. LIFE-BASEL, 2022, 12 (10):
  • [10] A comparative study of k-spectrum-based error correction methods for next-generation sequencing data analysis
    Akogwu, Isaac
    Wang, Nan
    Zhang, Chaoyang
    Gong, Ping
    [J]. HUMAN GENOMICS, 2016, 10