HashSeq: a Simple, Scalable, and Conservative De Novo Variant Caller for 16S rRNA Gene Data Sets

被引:1
|
作者
Fouladi, Farnaz [1 ]
Young, Jacqueline B. [1 ]
Fodor, Anthony A. [1 ]
机构
[1] Univ North Carolina Charlotte, Dept Bioinformat & Genom, Charlotte, NC 28223 USA
关键词
16S rRNA gene sequence variant; microbiome; sequence variant; sequencing error; SILVA;
D O I
10.1128/mSystems.00697-21
中图分类号
Q93 [微生物学];
学科分类号
071005 ; 100705 ;
摘要
16S rRNA gene sequencing is a common and cost-effective technique for characterization of microbial communities. Recent bioinformatics methods enable high-resolution detection of sequence variants of only one nucleotide difference. In this study, we utilized a very fast HashMap-based approach to detect sequence variants in six publicly available 16S rRNA gene data sets. We then use the normal distribution combined with locally estimated scatterplot smoothing (LOESS) regression to estimate background error rates as a function of sequencing depth for individual clusters of sequences. This method is computationally efficient and produces inference that yields sets of variants that are conservative and well supported by reference databases. We argue that this approach to inference is fast, simple, and scalable to large data sets and provides a high-resolution set of sequence variants which are less likely to be the result of sequencing error. IMPORTANCE Recent bioinformatics development has enabled the detection of sequence variants with a high resolution of only one single-nucleotide difference in 16S rRNA gene sequence data. Despite this progress, there are several limitations that can be associated with variant calling pipelines, such as producing a large number of low-abundance sequence variants which need to be filtered out with arbitrary thresholds in downstream analyses or having a slow runtime. In this report, we introduce a fast and scalable algorithm which infers sequence variants based on the estimation of a normally distributed background error as a function of sequencing depth. Our pipeline has attractive performance characteristics, can be used independently or in parallel with other variant callers, and provides explicit P values for each variant evaluating the hypothesis that a variant is caused by sequencing error.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] PCR-SSCP of the 16S rRNA gene, a simple methodology for species identification of fish eggs and larvae
    Garcia-Vazquez, Eva
    Alvarez, Paula
    Lopes, Placida
    Karaiskou, Nikoletta
    Perez, Juliana
    Teia, Ana
    Martinez, Jose L.
    Gomes, Laurentina
    Triantaphyllidis, Costas
    SCIENTIA MARINA, 2006, 70 : 13 - 21
  • [22] The Impact of DNA Polymerase and Number of Rounds of Amplification in PCR on 16S rRNA Gene Sequence Data
    Sze, Marc A.
    Schloss, Patrick D.
    MSPHERE, 2019, 4 (03):
  • [24] MicFunPred: A conserved approach to predict functional profiles from 16S rRNA gene sequence data
    Mongad, Dattatray S.
    Chavan, Nikeeta S.
    Narwade, Nitin P.
    Dixit, Kunal
    Shouche, Yogesh S.
    Dhotre, Dhiraj P.
    GENOMICS, 2021, 113 (05) : 3635 - 3643
  • [25] 16S rRNA Gene Amplicon Sequencing Data of Bacterial Community of Freshwater Sponge Lubomirskia baicalensis
    Belikov, Sergei, I
    Petrushin, Ivan S.
    Chernogor, Lubov, I
    MICROBIOLOGY RESOURCE ANNOUNCEMENTS, 2022, 11 (02):
  • [26] Reprocessing 16S rRNA Gene Amplicon Sequencing Studies: (Meta)Data Issues, Robustness, and Reproducibility
    Kang, Xiongbin
    Deng, Dong Mei
    Crielaard, Wim
    Brandt, Bernd W.
    FRONTIERS IN CELLULAR AND INFECTION MICROBIOLOGY, 2021, 11
  • [27] Oligotyping: differentiating between closely related microbial taxa using 16S rRNA gene data
    Eren, A. Murat
    Maignien, Lois
    Sul, Woo Jun
    Murphy, Leslie G.
    Grim, Sharon L.
    Morrison, Hilary G.
    Sogin, Mitchell L.
    METHODS IN ECOLOGY AND EVOLUTION, 2013, 4 (12): : 1111 - 1119
  • [28] 16S rRNA gene amplicon sequencing data from the gut microbiota of adolescent Afghan refugees
    Shahzad, Muhammad
    Saeedullah, Anum
    Khan, Muhammad Shabbir
    Ahmad, Habab Ali
    Iddrissu, Ishawu
    Andrews, Simon C.
    DATA IN BRIEF, 2024, 55
  • [29] 16S rRNA gene sequencing data of the human skin microbiome before and after swimming in the ocean
    Nielsen, Marisa C.
    Jiang, Sunny C.
    DATA IN BRIEF, 2021, 37
  • [30] 16S rRNA gene high-throughput sequencing data mining of microbial diversity and interactions
    Feng Ju
    Tong Zhang
    Applied Microbiology and Biotechnology, 2015, 99 : 4119 - 4129