Benchmarking of computational error-correction methods for next-generation sequencing data

被引:0
|
作者
Mitchell, Keith [1 ]
Brito, Jaqueline J. [2 ]
Mandric, Igor [1 ,3 ]
Wu, Qiaozhen [1 ]
Knyazev, Sergey [3 ]
Chang, Sei [1 ]
Martin, Lana S. [2 ]
Karlsberg, Aaron [2 ]
Gerasimov, Ekaterina [3 ]
Littman, Russell [1 ]
Hill, Brian L. [1 ]
Wu, Nicholas C. [4 ]
Yang, Harry [1 ]
Hsieh, Kevin [1 ]
Chen, Linus [1 ]
Littman, Eli [1 ]
Shabani, Taylor [1 ]
Enik, German [1 ]
Yao, Douglas [1 ]
Sun, Ren [1 ]
Schroeder, Jan [5 ]
Eskin, Eleazar [1 ]
Zelikovsky, Alex [6 ,7 ]
Skums, Pavel [3 ]
Pop, Mihai [8 ]
Mangul, Serghei [1 ,2 ]
机构
[1] Univ Calif Los Angeles, Los Angeles, CA 90024 USA
[2] USC, Los Angeles, CA USA
[3] Georgia State Univ, Atlanta, GA 30303 USA
[4] Scripps Res Inst, La Jolla, CA USA
[5] Monash Univ, Clayton, Vic, Australia
[6] GSU, Atlanta, GA USA
[7] MSMU, Moscow, Russia
[8] Univ Maryland, Baltimore, MD USA
关键词
D O I
10.1145/3388440.3414209
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Recent advancements in next-generation sequencing have rapidly improved our ability to study genomic material at an unprecedented scale. Despite substantial improvements in sequencing technologies, errors present in the data still risk confounding downstream analysis and limiting the applicability of sequencing technologies in clinical tools. Computational error-correction promises to eliminate sequencing errors, but the relative accuracy of error correction algorithms remains unknown. In this paper, we evaluate error-correction algorithms' ability to fix errors across different types of datasets that contain various levels of heterogeneity. We perform a realistic evaluation of several error correction tools. To measure the efficacy of these techniques, we apply the UMI-based high-fidelity sequencing protocol to eliminate sequencing errors from both simulated data and the raw reads. In terms of accuracy, we find that method performance varies substantially across different types of datasets with no single method performing best on all types of examined data. We also identify the techniques that offer a good balance between precision and sensitivity. This highlight showcases our paper's main findings [1], showing the advantages and limitations of computational error correction techniques across different domains of biology, including immunogenomics and virology.
引用
收藏
页数:1
相关论文
共 50 条
  • [11] A comparative study of k-spectrum-based error correction methods for next-generation sequencing data analysis
    Isaac Akogwu
    Nan Wang
    Chaoyang Zhang
    Ping Gong
    [J]. Human Genomics, 10
  • [12] An Empirical Evaluation of Error Correction Methods and Tools for Next Generation Sequencing Data
    Mehmood, Atif
    Ferzund, Javed
    Ali, Muhammad Usman
    Rehman, Abbas
    Ahmed, Shahzad
    Ahmad, Imran
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2018, 9 (01) : 425 - 431
  • [13] Computational classification of microRNAs in next-generation sequencing data
    Riback, Joshua
    Hatzigeorgiou, Artemis G.
    Reczko, Martin
    [J]. THEORETICAL CHEMISTRY ACCOUNTS, 2010, 125 (3-6) : 637 - 642
  • [14] Efficient error correction for next-generation sequencing of viral amplicons
    Skums, Pavel
    Dimitrova, Zoya
    Campo, David S.
    Vaughan, Gilberto
    Rossi, Livia
    Forbi, Joseph C.
    Yokosawa, Jonny
    Zelikovsky, Alex
    Khudyakov, Yury
    [J]. BMC BIOINFORMATICS, 2012, 13
  • [15] Computational classification of microRNAs in next-generation sequencing data
    Joshua Riback
    Artemis G. Hatzigeorgiou
    Martin Reczko
    [J]. Theoretical Chemistry Accounts, 2010, 125 : 637 - 642
  • [16] Efficient error correction for next-generation sequencing of viral amplicons
    Pavel Skums
    Zoya Dimitrova
    David S Campo
    Gilberto Vaughan
    Livia Rossi
    Joseph C Forbi
    Jonny Yokosawa
    Alex Zelikovsky
    Yury Khudyakov
    [J]. BMC Bioinformatics, 13
  • [17] A systematic comparison of error correction enzymes by next-generation sequencing
    Lubock, Nathan B.
    Zhang, Di
    Sidore, Angus M.
    Church, George M.
    Kosuri, Sriram
    [J]. NUCLEIC ACIDS RESEARCH, 2017, 45 (15) : 9206 - 9217
  • [18] Factorial Analysis of Error Correction Performance Using Simulated Next-Generation Sequencing Data
    Akogwu, Isaac
    Wang, Nan
    Zhang, Chaoyang
    Hong, Huixiao
    Choi, Hwanseok
    Gong, Ping
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2016, : 1164 - 1169
  • [19] Computational methods and next-generation sequencing approaches to analyze epigenetics data: Profiling of methods and applications
    Arora, Itika
    Tollefsbol, Trygve O.
    [J]. METHODS, 2021, 187 : 92 - 103
  • [20] Error filtering, pair assembly and error correction for next-generation sequencing reads
    Edgar, Robert C.
    Flyvbjerg, Henrik
    [J]. BIOINFORMATICS, 2015, 31 (21) : 3476 - 3482