Systematic processing of ribosomal RNA gene amplicon sequencing data

被引:47
|
作者
Tremblay, Julien [1 ]
Yergeau, Etienne [2 ]
机构
[1] Natl Res Council Canada, Energy Min & Environm, 6100 Royalmount Ave, Montreal, PQ H4P 2R2, Canada
[2] Inst Natl Rech Sci, Ctr INRS, Inst Armand Frappier, 531Ad Boul Prairies, Laval, PQ H7V 1B7, Canada
来源
GIGASCIENCE | 2019年 / 8卷 / 12期
关键词
rRNA gene amplicons; bioinformatics; metagenomics; High Performance Computing; GENOMICS;
D O I
10.1093/gigascience/giz146
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: With the advent of high-throughput sequencing, microbiology is becoming increasingly data-intensive. Because of its low cost, robust databases, and established bioinformatic workflows, sequencing of 16S/18S/ITS ribosomal RNA (rRNA) gene amplicons, which provides a marker of choice for phylogenetic studies, has become ubiquitous. Many established end-to-end bioinformatic pipelines are available to perform short amplicon sequence data analysis. These pipelines suit a general audience, but few options exist for more specialized users who are experienced in code scripting, Linux-based systems, and high-performance computing (HPC) environments. For such an audience, existing pipelines can be limiting to fully leverage modern HPC capabilities and perform tweaking and optimization operations. Moreover, a wealth of stand-alone software packages that perform specific targeted bioinformatic tasks are increasingly accessible, and finding a way to easily integrate these applications in a pipeline is critical to the evolution of bioinformatic methodologies. Results: Here we describe AmpliconTagger, a short rRNA marker gene amplicon pipeline coded in a Python framework that enables fine tuning and integration of virtually any potential rRNA gene amplicon bioinformatic procedure. It is designed to work within an HPC environment, supporting a complex network of job dependencies with a smart-restart mechanism in case of job failure or parameter modifications. As proof of concept, we present end results obtained with AmpliconTagger using 16S, 18S, ITS rRNA short gene amplicons and Pacific Biosciences long-read amplicon data types as input. Conclusions: Using a selection of published algorithms for generating operational taxonomic units and amplicon sequence variants and for computing downstream taxonomic summaries and diversity metrics, we demonstrate the performance and versatility of our pipeline for systematic analyses of amplicon sequence data.
引用
收藏
页数:14
相关论文
共 50 条
  • [2] Ecological Observations Based on Functional Gene Sequencing Are Sensitive to the Amplicon Processing Method
    Cholet, Fabien
    Lisik, Agata
    Agogue, Helene
    Ijaz, Umer Z.
    Pineau, Philippe
    Lachaussee, Nicolas
    Smith, Cindy J.
    [J]. MSPHERE, 2022, 7 (04)
  • [3] Processing ribosomal RNA
    Weitzman J.B.
    [J]. Genome Biology, 3 (1):
  • [4] Ribosomal RNA gene sequencing for early diagnosis of Blastomyces dermatitidis infection
    Morjaria, Sejal
    Otto, Caitlin
    Moreira, Andre
    Chung, Romy
    Hatzoglou, Vaios
    Pillai, Manju
    Banaei, Niaz
    Tang, Yi-Wei
    Figueroa, Cesar J.
    [J]. INTERNATIONAL JOURNAL OF INFECTIOUS DISEASES, 2015, 37 : 122 - 124
  • [5] The SILVA ribosomal RNA gene database project: improved data processing and web-based tools
    Quast, Christian
    Pruesse, Elmar
    Yilmaz, Pelin
    Gerken, Jan
    Schweer, Timmy
    Yarza, Pablo
    Peplies, Joerg
    Gloeckner, Frank Oliver
    [J]. NUCLEIC ACIDS RESEARCH, 2013, 41 (D1) : D590 - D596
  • [6] INITIATION, PROCESSING AND TERMINATION OF RIBOSOMAL-RNA FROM A HYBRID 5 S RIBOSOMAL-RNA GENE IN A PLASMID
    SZEBERENYI, J
    APIRION, D
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1983, 168 (03) : 525 - 561
  • [7] Fusion transcripts and their genomic breakpoints in polyadenylated and ribosomal RNA-minus RNA sequencing data
    Hoogstrate, Youri
    Komor, Malgorzata A.
    Bottcher, Rene
    van Riet, Job
    van de Werken, Harmen J. G.
    van Lieshout, Stef
    Hoffmann, Ralf
    van den Broek, Evert
    Bolijn, Anne S.
    Dits, Natasja
    Sie, Daoud
    van der Meer, David
    Pepers, Floor
    Bangma, Chris H.
    van Leenders, Geert J. L. H.
    Smid, Marcel
    French, Pim J.
    Martens, John W. M.
    van Workum, Wilbert
    van der Spek, Peter J.
    Janssen, Bart
    Caldenhoven, Eric
    Rausch, Christian
    de Jong, Mark
    Stubbs, Andrew P.
    Meijer, Gerrit A.
    Fijneman, Remond J. A.
    Jenster, Guido W.
    [J]. GIGASCIENCE, 2021, 10 (12):
  • [8] Clostridium bacteraemia characterised by 16S ribosomal RNA gene sequencing
    Woo, PCY
    Lau, SKP
    Chan, KM
    Fung, AMY
    Tang, BSF
    Yuen, KY
    [J]. JOURNAL OF CLINICAL PATHOLOGY, 2005, 58 (03) : 301 - 307
  • [9] CLONING AND SEQUENCING OF A HUMAN 18S RIBOSOMAL-RNA GENE
    TORCZYNSKI, RM
    FUKE, M
    BOLLON, AP
    [J]. DNA-A JOURNAL OF MOLECULAR & CELLULAR BIOLOGY, 1985, 4 (04): : 283 - 291
  • [10] Globicatella bacteraemia identified by 16S ribosomal RNA gene sequencing
    Lau, SKP
    Woo, PCY
    Li, NKH
    Teng, JLL
    Leung, KW
    Ng, KHL
    Que, TL
    Yuen, KY
    [J]. JOURNAL OF CLINICAL PATHOLOGY, 2006, 59 (03) : 303 - 307