isolateR: an R package for generating microbial libraries from Sanger sequencing data

被引:0
|
作者
Daisley, Brendan [1 ]
Vancuren, Sarah J. [1 ]
Brettingham, Dylan J. L. [1 ]
Wilde, Jacob [1 ]
Renwick, Simone [2 ,3 ]
Macpherson, Christine, V [1 ]
Good, David A. [1 ]
Botschner, Alexander J. [1 ]
Yen, Sandi [4 ]
Hill, Janet E. [5 ]
Sorbara, Matthew T. [1 ]
Allen-Vercoe, Emma [1 ]
机构
[1] Univ Guelph, Dept Mol & Cellular Biol, Guelph, ON N1G 2W1, Canada
[2] Univ Calif San Diego, Sch Med, Dept Pediat, San Diego, CA USA
[3] Univ Calif San Diego, Larsson Rosenquist Fdn, Human Milk Inst HMI, Mother Milk Infant Ctr Res Excellence MOMI CORE, La Jolla, CA 92123 USA
[4] Univ Oxford, Kennedy Inst Rheumatol, Med Sci Div, Oxford OX1 2JD, England
[5] Univ Saskatchewan, Dept Vet Microbiol, Saskatoon, SK S7N 5B4, Canada
关键词
16S RIBOSOMAL-RNA; CLASSIFICATION; IDENTIFICATION; SOFTWARE; STRAINS; SEARCH; LPSN;
D O I
10.1093/bioinformatics/btae448
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Sanger sequencing of taxonomic marker genes (e.g. 16S/18S/ITS/rpoB/cpn60) represents the leading method for identifying a wide range of microorganisms including bacteria, archaea, and fungi. However, the manual processing of sequence data and limitations associated with conventional BLAST searches impede the efficient generation of strain libraries essential for cataloging microbial diversity and discovering novel species. Results: isolateR addresses these challenges by implementing a standardized and scalable three-step pipeline that includes: (1) automated batch processing of Sanger sequence files, (2) taxonomic classification via global alignment to type strain databases in accordance with the latest international nomenclature standards, and (3) straightforward creation of strain libraries and handling of clonal isolates, with the ability to set customizable sequence dereplication thresholds and combine data from multiple sequencing runs into a single library. The tool's user-friendly design also features interactive HTML outputs that simplify data exploration and analysis. Additionally, in silico benchmarking done on two comprehensive human gut genome catalogues (IMGG and Hadza hunter-gather populations) showcase the proficiency of isolateR in uncovering and cataloging the nuanced spectrum of microbial diversity, advocating for a more targeted and granular exploration within individual hosts to achieve the highest strain-level resolution possible when generating culture collections. Availability and implementation: isolateR is available at: https://github.com/bdaisley/isolateR. [GRAPHICS] .
引用
收藏
页数:11
相关论文
共 50 条
  • [1] MicroNiche: an R package for assessing microbial niche breadth and overlap from amplicon sequencing data
    Finn, D. R.
    Yu, J.
    Ilhan, Z. E.
    Fernandes, V. M. C.
    Penton, C. R.
    Krajmalnik-Brown, R.
    Garcia-Pichel, F.
    Vogel, T. M.
    [J]. FEMS MICROBIOLOGY ECOLOGY, 2020, 96 (08)
  • [2] neuRosim: An R Package for Generating fMRI Data
    Welvaert, Marijke
    Durnez, Joke
    Moerkerke, Beatrijs
    Verdoolaege, Geert
    Rosseel, Yves
    [J]. JOURNAL OF STATISTICAL SOFTWARE, 2011, 44 (10): : 1 - 18
  • [3] sangeranalyseR: Simple and Interactive Processing of Sanger Sequencing Data in R
    Chao, Kuan-Hao
    Barton, Kirston
    Palmer, Sarah
    Lanfear, Robert
    [J]. GENOME BIOLOGY AND EVOLUTION, 2021, 13 (03):
  • [4] MultiOrd: An R Package for Generating Correlated Ordinal Data
    Amatya, Anup
    Demirtas, Hakan
    [J]. COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2015, 44 (07) : 1683 - 1691
  • [5] lab: an R package for generating analysis- ready data from laboratory records
    Tseng, Yi-Ju
    Chen, Chun Ju
    Chang, Chia Wei
    [J]. PEERJ COMPUTER SCIENCE, 2023, 9
  • [6] poRe: an R package for the visualization and analysis of nanopore sequencing data
    Watson, Mick
    Thomson, Marian
    Risse, Judith
    Talbot, Richard
    Santoyo-Lopez, Javier
    Gharbi, Karim
    Blaxter, Mark
    [J]. BIOINFORMATICS, 2015, 31 (01) : 114 - 115
  • [7] TSSr: an R package for comprehensive analyses of TSS sequencing data
    Lu, Zhaolian
    Berry, Keenan
    Hu, Zhenbin
    Zhan, Yu
    Ahn, Tae-Hyuk
    Lin, Zhenguo
    [J]. NAR GENOMICS AND BIOINFORMATICS, 2021, 3 (04)
  • [8] ClassNoise: An R package for modeling, generating, and validating data with class noise
    Martinez-Galicia, David
    Guerra-Hernandez, Alejandro
    Grimaldo, Francisco
    Cruz-Ramirez, Nicandro
    Limon, Xavier
    [J]. SOFTWAREX, 2024, 26
  • [9] microeco: an R package for data mining in microbial community ecology
    Liu, Chi
    Cui, Yaoming
    Li, Xiangzhen
    Yao, Minjie
    [J]. FEMS MICROBIOLOGY ECOLOGY, 2021, 97 (02)
  • [10] gcplyr: an R package for microbial growth curve data analysis
    Blazanin, Michael
    [J]. BMC BIOINFORMATICS, 2024, 25 (01):