SeroBA: rapid high-throughput serotyping of Streptococcus pneumoniae from whole genome sequence data

被引:84
|
作者
Epping, Lennard [1 ,2 ]
van Tonder, Andries J. [3 ]
Gladstone, Rebecca A. [3 ]
Bentley, Stephen D. [3 ]
Page, Andrew J. [1 ,4 ]
Keane, Jacqueline A. [1 ]
机构
[1] Wellcome Sanger Inst, Pathogen Informat, Hinxton CB10 1SA, Cambs, England
[2] Robert Koch Inst, Microbial Genom, Berlin, Germany
[3] Wellcome Sanger Inst, Infect Genom, Hinxton CB10 1SA, Cambs, England
[4] Norwich Res Pk, Quadram Inst, Norwich, Norfolk, England
来源
MICROBIAL GENOMICS | 2018年 / 4卷 / 07期
基金
英国惠康基金;
关键词
Streptococcus pneumoniae; serotyping; pneumococcal; whole genome sequencing; k-mer method; PNEUMOCOCCAL DISEASE; VACCINATION; DISCOVERY; CHILDREN; LOCUS; PCR;
D O I
10.1099/mgen.0.000186
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Streptococcus pneumoniae is responsible for 240 000-460 000 deaths in children under 5 years of age each year. Accurate identification of pneumococcal serotypes is important for tracking the distribution and evolution of serotypes following the introduction of effective vaccines. Recent efforts have been made to infer serotypes directly from genomic data but current software approaches are limited and do not scale well. Here, we introduce a novel method, SeroBA, which uses a k-mer approach. We compare SeroBA against real and simulated data and present results on the concordance and computational performance against a validation dataset, the robustness and scalability when analysing a large dataset, and the impact of varying the depth of coverage on sequence-based serotyping. SeroBA can predict serotypes, by identifying the cps locus, directly from raw whole genome sequencing read data with 98 % concordance using a k-mer-based method, can process 10 000 samples in just over 1 day using a standard server and can call serotypes at a coverage as low as 15-21x. SeroBA is implemented in Python3 and is freely available under an open source GPLv3 licence from: https://github.com/sangerpathogens/seroba
引用
收藏
页数:6
相关论文
共 50 条
  • [21] Characterization and complete genome sequence of a panicovirus from Bermuda grass by high-throughput sequencing
    Muhammad N. Tahir
    Ben Lockhart
    Samuel Grinstead
    Dimitre Mollov
    Archives of Virology, 2017, 162 : 1099 - 1102
  • [22] Characterization and complete genome sequence of a panicovirus from Bermuda grass by high-throughput sequencing
    Tahir, Muhammad N.
    Lockhart, Ben
    Grinstead, Samuel
    Mollov, Dimitre
    ARCHIVES OF VIROLOGY, 2017, 162 (04) : 1099 - 1102
  • [23] Introgression browser: high-throughput whole-genome SNP visualization
    Aflitos, Saulo Alves
    Sanchez-Perez, Gabino
    de Ridder, Dick
    Fransz, Paul
    Schranz, Michael E.
    de Jong, Hans
    Peters, Sander A.
    PLANT JOURNAL, 2015, 82 (01): : 174 - 182
  • [24] Accurate and exact CNV identification from targeted high-throughput sequence data
    Nord, Alex S.
    Lee, Ming
    King, Mary-Claire
    Walsh, Tom
    BMC GENOMICS, 2011, 12
  • [25] Accurate and exact CNV identification from targeted high-throughput sequence data
    Alex S Nord
    Ming Lee
    Mary-Claire King
    Tom Walsh
    BMC Genomics, 12
  • [26] Detecting genomic deletions from high-throughput sequence data with unsupervised learning
    Li X.
    Wu Y.
    BMC Bioinformatics, 2022, 23 (Suppl 8)
  • [27] A complete neandertal mitochondrial genome sequence determined by high-throughput Sequencing
    Green, Richard E.
    Malaspinas, Anna-Sapfo
    Krause, Johannes
    Briggs, Adrian W.
    Johnson, Philip L. F.
    Uhler, Caroline
    Meyer, Matthias
    Good, Jeffrey M.
    Maricic, Tomislav
    Stenzel, Udo
    Pruefer, Kay
    Siebauer, Michael
    Burbano, Hernan A.
    Ronan, Michael
    Rothberg, Jonathan M.
    Egholm, Michael
    Rudan, Pavao
    Brajkovic, Dejana
    Kucan, Zeljko
    Gusic, Ivan
    Wikstrom, Marten
    Laakkonen, Liisa
    Kelso, Janet
    Slatkin, Montgomery
    Paeaebo, Svante
    CELL, 2008, 134 (03) : 416 - 426
  • [28] Mash Screen: high-throughput sequence containment estimation for genome discovery
    Brian D. Ondov
    Gabriel J. Starrett
    Anna Sappington
    Aleksandra Kostic
    Sergey Koren
    Christopher B. Buck
    Adam M. Phillippy
    Genome Biology, 20
  • [29] Mash Screen: high-throughput sequence containment estimation for genome discovery
    Ondov, Brian D.
    Starrett, Gabriel J.
    Sappington, Anna
    Kostic, Aleksandra
    Koren, Sergey
    Buck, Christopher B.
    Phillippy, Adam M.
    GENOME BIOLOGY, 2019, 20 (01)
  • [30] High-throughput genomics in sorghum: from whole-genome resequencing to a SNP screening array
    Bekele, Wubishet A.
    Wieckhorst, Silke
    Friedt, Wolfgang
    Snowdon, Rod J.
    PLANT BIOTECHNOLOGY JOURNAL, 2013, 11 (09) : 1112 - 1125