HashSeq: a Simple, Scalable, and Conservative De Novo Variant Caller for 16S rRNA Gene Data Sets

被引:1
|
作者
Fouladi, Farnaz [1 ]
Young, Jacqueline B. [1 ]
Fodor, Anthony A. [1 ]
机构
[1] Univ North Carolina Charlotte, Dept Bioinformat & Genom, Charlotte, NC 28223 USA
关键词
16S rRNA gene sequence variant; microbiome; sequence variant; sequencing error; SILVA;
D O I
10.1128/mSystems.00697-21
中图分类号
Q93 [微生物学];
学科分类号
071005 ; 100705 ;
摘要
16S rRNA gene sequencing is a common and cost-effective technique for characterization of microbial communities. Recent bioinformatics methods enable high-resolution detection of sequence variants of only one nucleotide difference. In this study, we utilized a very fast HashMap-based approach to detect sequence variants in six publicly available 16S rRNA gene data sets. We then use the normal distribution combined with locally estimated scatterplot smoothing (LOESS) regression to estimate background error rates as a function of sequencing depth for individual clusters of sequences. This method is computationally efficient and produces inference that yields sets of variants that are conservative and well supported by reference databases. We argue that this approach to inference is fast, simple, and scalable to large data sets and provides a high-resolution set of sequence variants which are less likely to be the result of sequencing error. IMPORTANCE Recent bioinformatics development has enabled the detection of sequence variants with a high resolution of only one single-nucleotide difference in 16S rRNA gene sequence data. Despite this progress, there are several limitations that can be associated with variant calling pipelines, such as producing a large number of low-abundance sequence variants which need to be filtered out with arbitrary thresholds in downstream analyses or having a slow runtime. In this report, we introduce a fast and scalable algorithm which infers sequence variants based on the estimation of a normally distributed background error as a function of sequencing depth. Our pipeline has attractive performance characteristics, can be used independently or in parallel with other variant callers, and provides explicit P values for each variant evaluating the hypothesis that a variant is caused by sequencing error.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] SCRAPT: an iterative algorithm for clustering large 16S rRNA gene data sets
    Luan, Tu
    Muralidharan, Harihara Subrahmaniam
    Alshehri, Marwan
    Mittra, Ipsa
    Pop, Mihai
    NUCLEIC ACIDS RESEARCH, 2023, 51 (08) : e46
  • [2] De novo species identification using 16S rRNA gene nanopore sequencinge10029
    Angell, Inga Leena
    Nilsen, Morten
    Carlsen, Karin C. Lodrup
    Carlsen, Kai-Hakon
    Hedlin, Gunilla
    Jonassen, Christine M.
    Marsland, Benjamin
    Nordlund, Bjorn
    Rehbinder, Eva Maria
    Saunders, Carina
    Skjerven, Havard Ove
    Staff, Anne Cathrine
    Soderhall, Cilla
    Vettukattil, Riyas
    Rudi, Knut
    PEERJ, 2020, 8
  • [3] De novo Semi-alignment of 16S rRNA Gene Sequences for Deep Phylogenetic Characterization of Next Generation Sequencing Data
    Avershina, Ekaterina
    Frisli, Trine
    Rudi, Knut
    MICROBES AND ENVIRONMENTS, 2013, 28 (02) : 211 - 216
  • [4] metaSPARSim: a 16S rRNA gene sequencing count data simulator
    Ilaria Patuzzi
    Giacomo Baruzzo
    Carmen Losasso
    Antonia Ricci
    Barbara Di Camillo
    BMC Bioinformatics, 20
  • [5] metaSPARSim: a 16S rRNA gene sequencing count data simulator
    Patuzzi, Ilaria
    Baruzzo, Giacomo
    Losasso, Carmen
    Ricci, Antonia
    Di Camillo, Barbara
    BMC BIOINFORMATICS, 2019, 20 (Suppl 9)
  • [6] Isolation and identification of Ktedonobacteria using 16S rRNA gene sequences data
    Rachmania, M. K.
    Ningsih, F.
    Sakai, Y.
    Yabe, S.
    Yokota, A.
    Sjamsuridzal, W.
    INTERNATIONAL SYMPOSIUM OF INNOVATIVE BIO-PRODUCTION INDONESIA ON BIOTECHNOLOGY AND BIOENGINEERING 2019, 2020, 439
  • [7] 16S rRNA Gene Amplicon Sequencing Data for Pteris vittata Rhizosphere Soils
    Mu'azu, Aminu Salisu
    Haris, Hazzeman
    Zarkasi, Kamarul Zaman
    Lau, Nyok-Sean
    Ghazali, Amir Hamzah
    MICROBIOLOGY RESOURCE ANNOUNCEMENTS, 2023, 12 (03):
  • [8] Molecular data from the 16S rRNA gene for the phylogeny of Veneridae (Mollusca: Bivalvia)
    Canapa, A
    Schiaparelli, S
    Marota, I
    Barucca, M
    MARINE BIOLOGY, 2003, 142 (06) : 1125 - 1130
  • [9] Molecular data from the 16S rRNA gene for the phylogeny of Veneridae (Mollusca: Bivalvia)
    A. Canapa
    S. Schiaparelli
    I. Marota
    M. Barucca
    Marine Biology, 2003, 142 : 1125 - 1130
  • [10] Molecular Data from the 16S rRNA Gene for the Phylogeny of Pectinidae (Mollusca: Bivalvia)
    Adriana Canapa
    Marco Barucca
    Annalisa Marinelli
    Ettore Olmo
    Journal of Molecular Evolution, 2000, 50 : 93 - 97