Benchmarking reveals superiority of deep learning variant callers on bacterial nanopore sequence data

被引:4
|
作者
Hall, Michael B. [1 ]
Wick, Ryan R. [1 ,2 ]
Judd, Louise M. [1 ,2 ]
Nguyen, An N. [1 ]
Steinig, Eike J. [1 ]
Xie, Ouli [3 ,4 ]
Davies, Mark [1 ]
Seemann, Torsten [1 ,2 ]
Stinear, Timothy P. [1 ,2 ]
Coin, Lachlan [1 ]
机构
[1] Univ Melbourne, Peter Doherty Inst Infect & Immun, Dept Microbiol & Immunol, Melbourne, Australia
[2] Univ Melbourne, Ctr Pathogen Genom, Melbourne, Australia
[3] Univ Melbourne, Peter Doherty Inst Infect & Immun, Dept Infect Dis, Melbourne, Australia
[4] Monash Hlth, Monash Infect Dis, Melbourne, Vic, Australia
来源
ELIFE | 2024年 / 13卷
基金
英国医学研究理事会;
关键词
D O I
10.7554/eLife.98300
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Variant calling is fundamental in bacterial genomics, underpinning the identification of disease transmission clusters, the construction of phylogenetic trees, and antimicrobial resistance detection. This study presents a comprehensive benchmarking of variant calling accuracy in bacterial genomes using Oxford Nanopore Technologies (ONT) sequencing data. We evaluated three ONT basecalling models and both simplex (single-strand) and duplex (dual-strand) read types across 14 diverse bacterial species. Our findings reveal that deep learning-based variant callers, particularly Clair3 and DeepVariant, significantly outperform traditional methods and even exceed the accuracy of Illumina sequencing, especially when applied to ONT's super-high accuracy model. ONT's superior performance is attributed to its ability to overcome Illumina's errors, which often arise from difficulties in aligning reads in repetitive and variant-dense genomic regions. Moreover, the use of high-performing variant callers with ONT's super-high accuracy data mitigates ONT's traditional errors in homopolymers. We also investigated the impact of read depth on variant calling, demonstrating that 10x depth of ONT super-accuracy data can achieve precision and recall comparable to, or better than, full-depth Illumina sequencing. These results underscore the potential of ONT sequencing, combined with advanced variant calling algorithms, to replace traditional short-read sequencing methods in bacterial genomics, particularly in resource-limited settings.
引用
收藏
页数:23
相关论文
共 50 条
  • [1] Benchmarking of long-read structural variant callers using in-house generated Oxford Nanopore data
    De Clercq, Griet
    Van Gaever, Bram
    Vantomme, Lies
    Dheedene, Annelies
    Menten, Bjorn
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2023, 31 : 601 - 602
  • [2] Benchmarking long-read aligners and SV callers for structural variation detection in Oxford nanopore sequencing data
    Helal, Asmaa A.
    Saad, Bishoy T.
    Saad, Mina T.
    Mosaad, Gamal S.
    Aboshanab, Khaled M.
    SCIENTIFIC REPORTS, 2024, 14 (01)
  • [3] Benchmarking long-read aligners and SV callers for structural variation detection in Oxford nanopore sequencing data
    Asmaa A. Helal
    Bishoy T. Saad
    Mina T. Saad
    Gamal S. Mosaad
    Khaled M. Aboshanab
    Scientific Reports, 14
  • [4] Comparison of structural variant callers for massive whole-genome sequence data
    Joe, Soobok
    Park, Jong-Lyul
    Kim, Jun
    Kim, Sangok
    Park, Ji-Hwan
    Yeo, Min-Kyung
    Lee, Dongyoon
    Yang, Jin Ok
    Kim, Seon-Young
    BMC GENOMICS, 2024, 25 (01)
  • [5] Comparison of structural variant callers for massive whole-genome sequence data
    Soobok Joe
    Jong-Lyul Park
    Jun Kim
    Sangok Kim
    Ji-Hwan Park
    Min-Kyung Yeo
    Dongyoon Lee
    Jin Ok Yang
    Seon-Young Kim
    BMC Genomics, 25
  • [6] Detailed simulation of cancer exome sequencing data reveals differences and common limitations of variant callers
    Hofmann, Ariane L.
    Behr, Jonas
    Singer, Jochen
    Kuipers, Jack
    Beisel, Christian
    Schraml, Peter
    Moch, Holger
    Beerenwinkel, Niko
    BMC BIOINFORMATICS, 2017, 18
  • [7] Benchmarking bacterial taxonomic classification using nanopore metagenomics data of several mock communities
    Van Uffelen, Alexander
    Posadas, Andres
    Roosens, Nancy H. C.
    Marchal, Kathleen
    De Keersmaecker, Sigrid C. J.
    Vanneste, Kevin
    SCIENTIFIC DATA, 2024, 11 (01)
  • [8] Detailed simulation of cancer exome sequencing data reveals differences and common limitations of variant callers
    Ariane L. Hofmann
    Jonas Behr
    Jochen Singer
    Jack Kuipers
    Christian Beisel
    Peter Schraml
    Holger Moch
    Niko Beerenwinkel
    BMC Bioinformatics, 18
  • [9] Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches
    Cherukuri, Yesesri
    Janga, Sarath Chandra
    BMC GENOMICS, 2016, 17
  • [10] Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches
    Yesesri Cherukuri
    Sarath Chandra Janga
    BMC Genomics, 17