Benchmarking of computational error-correction methods for next-generation sequencing data

被引:0
|
作者
Mitchell, Keith [1 ]
Brito, Jaqueline J. [2 ]
Mandric, Igor [1 ,3 ]
Wu, Qiaozhen [1 ]
Knyazev, Sergey [3 ]
Chang, Sei [1 ]
Martin, Lana S. [2 ]
Karlsberg, Aaron [2 ]
Gerasimov, Ekaterina [3 ]
Littman, Russell [1 ]
Hill, Brian L. [1 ]
Wu, Nicholas C. [4 ]
Yang, Harry [1 ]
Hsieh, Kevin [1 ]
Chen, Linus [1 ]
Littman, Eli [1 ]
Shabani, Taylor [1 ]
Enik, German [1 ]
Yao, Douglas [1 ]
Sun, Ren [1 ]
Schroeder, Jan [5 ]
Eskin, Eleazar [1 ]
Zelikovsky, Alex [6 ,7 ]
Skums, Pavel [3 ]
Pop, Mihai [8 ]
Mangul, Serghei [1 ,2 ]
机构
[1] Univ Calif Los Angeles, Los Angeles, CA 90024 USA
[2] USC, Los Angeles, CA USA
[3] Georgia State Univ, Atlanta, GA 30303 USA
[4] Scripps Res Inst, La Jolla, CA USA
[5] Monash Univ, Clayton, Vic, Australia
[6] GSU, Atlanta, GA USA
[7] MSMU, Moscow, Russia
[8] Univ Maryland, Baltimore, MD USA
关键词
D O I
10.1145/3388440.3414209
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Recent advancements in next-generation sequencing have rapidly improved our ability to study genomic material at an unprecedented scale. Despite substantial improvements in sequencing technologies, errors present in the data still risk confounding downstream analysis and limiting the applicability of sequencing technologies in clinical tools. Computational error-correction promises to eliminate sequencing errors, but the relative accuracy of error correction algorithms remains unknown. In this paper, we evaluate error-correction algorithms' ability to fix errors across different types of datasets that contain various levels of heterogeneity. We perform a realistic evaluation of several error correction tools. To measure the efficacy of these techniques, we apply the UMI-based high-fidelity sequencing protocol to eliminate sequencing errors from both simulated data and the raw reads. In terms of accuracy, we find that method performance varies substantially across different types of datasets with no single method performing best on all types of examined data. We also identify the techniques that offer a good balance between precision and sensitivity. This highlight showcases our paper's main findings [1], showing the advantages and limitations of computational error correction techniques across different domains of biology, including immunogenomics and virology.
引用
收藏
页数:1
相关论文
共 50 条
  • [41] The role of replicates for error mitigation in next-generation sequencing
    Kimberly Robasky
    Nathan E. Lewis
    George M. Church
    [J]. Nature Reviews Genetics, 2014, 15 : 56 - 62
  • [42] Methods to improve the accuracy of next-generation sequencing
    Cheng, Chu
    Fei, Zhongjie
    Xiao, Pengfeng
    [J]. FRONTIERS IN BIOENGINEERING AND BIOTECHNOLOGY, 2023, 11
  • [43] Benchmarking variant callers in next-generation and third-generation sequencing analysis
    Pei, Surui
    Liu, Tao
    Ren, Xue
    Li, Weizhong
    Chen, Chongjian
    Xie, Zhi
    [J]. BRIEFINGS IN BIOINFORMATICS, 2021, 22 (03)
  • [44] Computational Methods Enabling Next-Generation Bioprocesses
    Banga, Julio R.
    Menolascina, Filippo
    [J]. PROCESSES, 2019, 7 (04)
  • [45] Author Correction: A comparison of tools for the simulation of genomic next-generation sequencing data
    Merly Escalona
    Sara Rocha
    David Posada
    [J]. Nature Reviews Genetics, 2018, 19 : 733 - 733
  • [46] Next-generation sequencing of the next generation
    Darren J. Burgess
    [J]. Nature Reviews Genetics, 2011, 12 : 78 - 79
  • [47] Benchmarking software tools for trimming adapters and merging next-generation sequencing data for ancient DNA
    Lien, Annette
    Legori, Leonardo Pestana
    Kraft, Louis
    Sackett, Peter Wad
    Renaud, Gabriel
    [J]. FRONTIERS IN BIOINFORMATICS, 2023, 3
  • [48] Systematic comparative study of computational methods for HLA typing from next-generation sequencing
    Yu, Yuechun
    Wang, Ke
    Fahira, Aamir
    Yang, Qiangzhen
    Sun, Renliang
    Li, Zhiqiang
    Wang, Zhuo
    Shi, Yongyong
    [J]. HLA, 2021, 97 (06) : 481 - 492
  • [49] ESREEM: Efficient Short Reads Error Estimation Computational Model for Next-generation Genome Sequencing
    Tahir, Muhammad
    Sardaraz, Muhammad
    Mehmood, Zahid
    Khan, Muhammad Saud
    [J]. CURRENT BIOINFORMATICS, 2021, 16 (02) : 339 - 349
  • [50] Visualizing next-generation sequencing data with JBrowse
    Westesson, Oscar
    Skinner, Mitchell
    Holmes, Ian
    [J]. BRIEFINGS IN BIOINFORMATICS, 2013, 14 (02) : 172 - 177