Benchmarking reveals superiority of deep learning variant callers on bacterial nanopore sequence data

被引:4
|
作者
Hall, Michael B. [1 ]
Wick, Ryan R. [1 ,2 ]
Judd, Louise M. [1 ,2 ]
Nguyen, An N. [1 ]
Steinig, Eike J. [1 ]
Xie, Ouli [3 ,4 ]
Davies, Mark [1 ]
Seemann, Torsten [1 ,2 ]
Stinear, Timothy P. [1 ,2 ]
Coin, Lachlan [1 ]
机构
[1] Univ Melbourne, Peter Doherty Inst Infect & Immun, Dept Microbiol & Immunol, Melbourne, Australia
[2] Univ Melbourne, Ctr Pathogen Genom, Melbourne, Australia
[3] Univ Melbourne, Peter Doherty Inst Infect & Immun, Dept Infect Dis, Melbourne, Australia
[4] Monash Hlth, Monash Infect Dis, Melbourne, Vic, Australia
来源
ELIFE | 2024年 / 13卷
基金
英国医学研究理事会;
关键词
D O I
10.7554/eLife.98300
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Variant calling is fundamental in bacterial genomics, underpinning the identification of disease transmission clusters, the construction of phylogenetic trees, and antimicrobial resistance detection. This study presents a comprehensive benchmarking of variant calling accuracy in bacterial genomes using Oxford Nanopore Technologies (ONT) sequencing data. We evaluated three ONT basecalling models and both simplex (single-strand) and duplex (dual-strand) read types across 14 diverse bacterial species. Our findings reveal that deep learning-based variant callers, particularly Clair3 and DeepVariant, significantly outperform traditional methods and even exceed the accuracy of Illumina sequencing, especially when applied to ONT's super-high accuracy model. ONT's superior performance is attributed to its ability to overcome Illumina's errors, which often arise from difficulties in aligning reads in repetitive and variant-dense genomic regions. Moreover, the use of high-performing variant callers with ONT's super-high accuracy data mitigates ONT's traditional errors in homopolymers. We also investigated the impact of read depth on variant calling, demonstrating that 10x depth of ONT super-accuracy data can achieve precision and recall comparable to, or better than, full-depth Illumina sequencing. These results underscore the potential of ONT sequencing, combined with advanced variant calling algorithms, to replace traditional short-read sequencing methods in bacterial genomics, particularly in resource-limited settings.
引用
收藏
页数:23
相关论文
共 50 条
  • [41] Genome-wide detection of cytosine methylations in plant from Nanopore data using deep learning
    Peng Ni
    Neng Huang
    Fan Nie
    Jun Zhang
    Zhi Zhang
    Bo Wu
    Lu Bai
    Wende Liu
    Chuan-Le Xiao
    Feng Luo
    Jianxin Wang
    Nature Communications, 12
  • [42] Genome-wide detection of cytosine methylations in plant from Nanopore data using deep learning
    Ni, Peng
    Huang, Neng
    Nie, Fan
    Zhang, Jun
    Zhang, Zhi
    Wu, Bo
    Bai, Lu
    Liu, Wende
    Xiao, Chuan-Le
    Luo, Feng
    Wang, Jianxin
    NATURE COMMUNICATIONS, 2021, 12 (01)
  • [43] Deep bacteria: Robust deep learning data augmentation design for limited bacterial colony dataset
    Khalifa N.E.M.
    Taha M.H.N.
    Hassanien A.E.
    Hemedan A.A.
    International Journal of Reasoning-based Intelligent Systems, 2019, 11 (03): : 256 - 264
  • [44] Deep learning model of somatic hypermutation reveals importance of sequence context beyond hotspot targeting
    Tang, Catherine
    Krantsevich, Artem
    MacCarthy, Thomas
    ISCIENCE, 2022, 25 (01)
  • [45] DNASimCLR: a contrastive learning-based deep learning approach for gene sequence data classification
    Yang, Minghao
    Wang, Zehua
    Yan, Zizhuo
    Wang, Wenxiang
    Zhu, Qian
    Jin, Changlong
    BMC BIOINFORMATICS, 2024, 25 (01):
  • [46] Exploring different representations of hydraulic tomographic data for deep learning: Sequence or image
    Ji, Yuzhe
    Zha, Yuanyuan
    Gong, Xuezi
    JOURNAL OF HYDROLOGY, 2025, 648
  • [47] Polishing copy number variant calls on exome sequencing data via deep learning
    Ozden, Furkan
    Alkan, Can
    Cicek, A. Ercument
    GENOME RESEARCH, 2022, 32 (06) : 1170 - 1182
  • [48] Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk
    Zhou, Jian
    Theesfeld, Chandra L.
    Yao, Kevin
    Chen, Kathleen M.
    Wong, Aaron K.
    Troyanskaya, Olga G.
    NATURE GENETICS, 2018, 50 (08) : 1171 - +
  • [49] Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk
    Jian Zhou
    Chandra L. Theesfeld
    Kevin Yao
    Kathleen M. Chen
    Aaron K. Wong
    Olga G. Troyanskaya
    Nature Genetics, 2018, 50 : 1171 - 1179
  • [50] Exploring the superiority of solar-induced chlorophyll fluorescence data in predicting wheat yield using machine learning and deep learning methods
    Liu, Yuanyuan
    Wang, Shaoqiang
    Wang, Xiaobo
    Chen, Bin
    Chen, Jinghua
    Wang, Junbang
    Huang, Mei
    Wang, Zhaosheng
    Ma, Li
    Wang, Pengyuan
    Amir, Muhammad
    Zhu, Kai
    COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2022, 192