SeroBA: rapid high-throughput serotyping of Streptococcus pneumoniae from whole genome sequence data

被引:84
|
作者
Epping, Lennard [1 ,2 ]
van Tonder, Andries J. [3 ]
Gladstone, Rebecca A. [3 ]
Bentley, Stephen D. [3 ]
Page, Andrew J. [1 ,4 ]
Keane, Jacqueline A. [1 ]
机构
[1] Wellcome Sanger Inst, Pathogen Informat, Hinxton CB10 1SA, Cambs, England
[2] Robert Koch Inst, Microbial Genom, Berlin, Germany
[3] Wellcome Sanger Inst, Infect Genom, Hinxton CB10 1SA, Cambs, England
[4] Norwich Res Pk, Quadram Inst, Norwich, Norfolk, England
来源
MICROBIAL GENOMICS | 2018年 / 4卷 / 07期
基金
英国惠康基金;
关键词
Streptococcus pneumoniae; serotyping; pneumococcal; whole genome sequencing; k-mer method; PNEUMOCOCCAL DISEASE; VACCINATION; DISCOVERY; CHILDREN; LOCUS; PCR;
D O I
10.1099/mgen.0.000186
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Streptococcus pneumoniae is responsible for 240 000-460 000 deaths in children under 5 years of age each year. Accurate identification of pneumococcal serotypes is important for tracking the distribution and evolution of serotypes following the introduction of effective vaccines. Recent efforts have been made to infer serotypes directly from genomic data but current software approaches are limited and do not scale well. Here, we introduce a novel method, SeroBA, which uses a k-mer approach. We compare SeroBA against real and simulated data and present results on the concordance and computational performance against a validation dataset, the robustness and scalability when analysing a large dataset, and the impact of varying the depth of coverage on sequence-based serotyping. SeroBA can predict serotypes, by identifying the cps locus, directly from raw whole genome sequencing read data with 98 % concordance using a k-mer-based method, can process 10 000 samples in just over 1 day using a standard server and can call serotypes at a coverage as low as 15-21x. SeroBA is implemented in Python3 and is freely available under an open source GPLv3 licence from: https://github.com/sangerpathogens/seroba
引用
收藏
页数:6
相关论文
共 50 条
  • [31] Toward the development of a gene index to the human genome: An assessment of the nature of high-throughput EST sequence data
    Aaronson, JS
    Eckman, B
    Blevins, RA
    Borkowski, JA
    Myerson, J
    Imran, S
    Elliston, KO
    GENOME RESEARCH, 1996, 6 (09): : 829 - 845
  • [32] An association study on imputed whole-genome resequencing from high-throughput sequencing data for body traits in crossbred pigs
    Li, Cong
    Duan, Dongdong
    Xue, Yahui
    Han, Xuelei
    Wang, Kejun
    Qiao, Ruimin
    Li, Xiu-Ling
    Li, Xin-Jian
    ANIMAL GENETICS, 2022, 53 (02) : 212 - 219
  • [33] Identification and correction of systematic error in high-throughput sequence data
    Meacham, Frazer
    Boffelli, Dario
    Dhahbi, Joseph
    Martin, David I. K.
    Singer, Meromit
    Pachter, Lior
    BMC BIOINFORMATICS, 2011, 12
  • [34] Identification and correction of systematic error in high-throughput sequence data
    Frazer Meacham
    Dario Boffelli
    Joseph Dhahbi
    David IK Martin
    Meromit Singer
    Lior Pachter
    BMC Bioinformatics, 12
  • [35] MICROSATELLITE DEVELOPMENT IN RHODOPHYTA USING HIGH-THROUGHPUT SEQUENCE DATA
    Couceiro, Lucia
    Maneiro, Isabel
    Mauger, Stephane
    Valero, Myriam
    Miguel Ruiz, Jose
    Barreiro, Rodolfo
    JOURNAL OF PHYCOLOGY, 2011, 47 (06) : 1258 - 1265
  • [36] Grapevine virus T diversity as revealed by full-length genome sequences assembled from high-throughput sequence data
    Zarghani, Shaheen Nourinejhad
    Hill, Jean Michel
    Glasa, Miroslav
    Marais, Armelle
    Wetzel, Thierry
    Faure, Chantal
    Vigne, Emmanuelle
    Velt, Amandine
    Lemaire, Olivier
    Boursiquot, Jean Michel
    Okic, Arnela
    Belen Ruiz-Garcia, Ana
    Olmos, Antonio
    Lacombe, Thierry
    Candresse, Thierry
    PLOS ONE, 2018, 13 (10):
  • [37] Whole genome sequencing of macrolide resistant Streptococcus pneumoniae serotype 19A sequence type 416
    Spanelova, Petra
    Jakubu, Vladislav
    Malisova, Lucia
    Musilek, Martin
    Kozakova, Jana
    Papagiannitsis, Costas C.
    Bitar, Ibrahim
    Hrabak, Jaroslav
    Pantosti, Annalisa
    del Grosso, Maria
    Zemlickova, Helena
    BMC MICROBIOLOGY, 2020, 20 (01)
  • [38] Whole genome sequencing of macrolide resistant Streptococcus pneumoniae serotype 19A sequence type 416
    Petra Spanelova
    Vladislav Jakubu
    Lucia Malisova
    Martin Musilek
    Jana Kozakova
    Costas C. Papagiannitsis
    Ibrahim Bitar
    Jaroslav Hrabak
    Annalisa Pantosti
    Maria del Grosso
    Helena Zemlickova
    BMC Microbiology, 20
  • [39] AutoMeDIP-seq: A high-throughput, whole genome, DNA methylation assay
    Butcher, Lee M.
    Beck, Stephan
    METHODS, 2010, 52 (03) : 223 - 231
  • [40] High-throughput whole genome sequencing of apricot (Prunus armeniaca) cultivar 'Hacihaliloglu'
    Teber, S.
    Gurcan, K.
    Akbulut, M.
    Abbasov, M.
    Ercisli, S.
    XVII INTERNATIONAL SYMPOSIUM ON APRICOT BREEDING AND CULTURE, 2020, 1290 : 53 - 57