Benchmarking of computational error-correction methods for next-generation sequencing data

被引:0
|
作者
Mitchell, Keith [1 ]
Brito, Jaqueline J. [2 ]
Mandric, Igor [1 ,3 ]
Wu, Qiaozhen [1 ]
Knyazev, Sergey [3 ]
Chang, Sei [1 ]
Martin, Lana S. [2 ]
Karlsberg, Aaron [2 ]
Gerasimov, Ekaterina [3 ]
Littman, Russell [1 ]
Hill, Brian L. [1 ]
Wu, Nicholas C. [4 ]
Yang, Harry [1 ]
Hsieh, Kevin [1 ]
Chen, Linus [1 ]
Littman, Eli [1 ]
Shabani, Taylor [1 ]
Enik, German [1 ]
Yao, Douglas [1 ]
Sun, Ren [1 ]
Schroeder, Jan [5 ]
Eskin, Eleazar [1 ]
Zelikovsky, Alex [6 ,7 ]
Skums, Pavel [3 ]
Pop, Mihai [8 ]
Mangul, Serghei [1 ,2 ]
机构
[1] Univ Calif Los Angeles, Los Angeles, CA 90024 USA
[2] USC, Los Angeles, CA USA
[3] Georgia State Univ, Atlanta, GA 30303 USA
[4] Scripps Res Inst, La Jolla, CA USA
[5] Monash Univ, Clayton, Vic, Australia
[6] GSU, Atlanta, GA USA
[7] MSMU, Moscow, Russia
[8] Univ Maryland, Baltimore, MD USA
关键词
D O I
10.1145/3388440.3414209
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Recent advancements in next-generation sequencing have rapidly improved our ability to study genomic material at an unprecedented scale. Despite substantial improvements in sequencing technologies, errors present in the data still risk confounding downstream analysis and limiting the applicability of sequencing technologies in clinical tools. Computational error-correction promises to eliminate sequencing errors, but the relative accuracy of error correction algorithms remains unknown. In this paper, we evaluate error-correction algorithms' ability to fix errors across different types of datasets that contain various levels of heterogeneity. We perform a realistic evaluation of several error correction tools. To measure the efficacy of these techniques, we apply the UMI-based high-fidelity sequencing protocol to eliminate sequencing errors from both simulated data and the raw reads. In terms of accuracy, we find that method performance varies substantially across different types of datasets with no single method performing best on all types of examined data. We also identify the techniques that offer a good balance between precision and sensitivity. This highlight showcases our paper's main findings [1], showing the advantages and limitations of computational error correction techniques across different domains of biology, including immunogenomics and virology.
引用
收藏
页数:1
相关论文
共 50 条
  • [21] Computational methods for discovering structural variation with next-generation sequencing
    Medvedev, Paul
    Stanciu, Monica
    Brudno, Michael
    [J]. NATURE METHODS, 2009, 6 (11) : S13 - S20
  • [22] Computational Methods in Microbe Detection Using Next-Generation Sequencing
    Zhou Zi-Han
    Peng Shao-Liang
    Bo Xiao-Chen
    Li Fei
    [J]. PROGRESS IN BIOCHEMISTRY AND BIOPHYSICS, 2017, 44 (01) : 58 - 69
  • [23] Computational methods for discovering structural variation with next-generation sequencing
    Medvedev P.
    Stanciu M.
    Brudno M.
    [J]. Nature Methods, 2009, 6 (Suppl 11) : S13 - S20
  • [24] Analysis of error profiles in deep next-generation sequencing data
    Ma, Xiaotu
    Shao, Ying
    Tian, Liqing
    Flasch, Diane A.
    Mulder, Heather L.
    Edmonson, Michael N.
    Liu, Yu
    Chen, Xiang
    Newman, Scott
    Nakitandwe, Joy
    Li, Yongjin
    Li, Benshang
    Shen, Shuhong
    Wang, Zhaoming
    Shurtleff, Sheila
    Robison, Leslie L.
    Levy, Shawn
    Easton, John
    Zhang, Jinghui
    [J]. GENOME BIOLOGY, 2019, 20 (1)
  • [25] Analysis of error profiles in deep next-generation sequencing data
    Ma, Xiaotu
    Zhang, Jinghui
    [J]. CANCER RESEARCH, 2019, 79 (13)
  • [26] Analysis of error profiles in deep next-generation sequencing data
    Xiaotu Ma
    Ying Shao
    Liqing Tian
    Diane A. Flasch
    Heather L. Mulder
    Michael N. Edmonson
    Yu Liu
    Xiang Chen
    Scott Newman
    Joy Nakitandwe
    Yongjin Li
    Benshang Li
    Shuhong Shen
    Zhaoming Wang
    Sheila Shurtleff
    Leslie L. Robison
    Shawn Levy
    John Easton
    Jinghui Zhang
    [J]. Genome Biology, 20
  • [27] NGSNGS: next-generation simulator for next-generation sequencing data
    Henriksen, Rasmus Amund
    Zhao, Lei
    Korneliussen, Thorfinn Sand
    [J]. BIOINFORMATICS, 2023, 39 (01)
  • [28] Computational methods and translational applications for targeted next-generation sequencing platforms
    Luthra, Anisha
    Mastrogiacomo, Brooke
    Smith, Shaleigh A.
    Chakravarty, Debyani
    Schultz, Nikolaus
    Sanchez-Vega, Francisco
    [J]. GENES CHROMOSOMES & CANCER, 2022, 61 (06): : 322 - 331
  • [29] Bioinformatics Methods and Biological Interpretation for Next-Generation Sequencing Data
    Wang, Guohua
    Liu, Yunlong
    Zhu, Dongxiao
    Klau, Gunnar W.
    Feng, Weixing
    [J]. BIOMED RESEARCH INTERNATIONAL, 2015, 2015
  • [30] Discriminant Analysis and Normalization Methods for Next-Generation Sequencing Data
    Zhou, Yan
    Wang, Junhui
    Zhao, Yichuan
    Tong, Tiejun
    [J]. NEW FRONTIERS OF BIOSTATISTICS AND BIOINFORMATICS, 2018, : 365 - 384