SCRAPT: an iterative algorithm for clustering large 16S rRNA gene data sets

被引:1
|
作者
Luan, Tu [1 ,2 ]
Muralidharan, Harihara Subrahmaniam [1 ,2 ]
Alshehri, Marwan [1 ]
Mittra, Ipsa [1 ]
Pop, Mihai [1 ,2 ]
机构
[1] Univ Maryland, Dept Comp Sci, College Pk, MD 20742 USA
[2] Univ Maryland, Ctr Bioinformat & Computat Biol, College Pk, MD 20742 USA
基金
美国国家卫生研究院;
关键词
IDENTIFICATION; INFERENCE; CATALOG; EST;
D O I
10.1093/nar/gkad158
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
16S rRNA gene sequence clustering is an important tool in characterizing the diversity of microbial communities. As 16S rRNA gene data sets are growing in size, existing sequence clustering algorithms increasingly become an analytical bottleneck. Part of this bottleneck is due to the substantial computational cost expended on small clusters and singleton sequences. We propose an iterative sampling-based 16S rRNA gene sequence clustering approach that targets the largest clusters in the data set, allowing users to stop the clustering process when sufficient clusters are available for the specific analysis being targeted. We describe a probabilistic analysis of the iterative clustering process that supports the intuition that the clustering process identifies the larger clusters in the data set first. Using real data sets of 16S rRNA gene sequences, we show that the iterative algorithm, coupled with an adaptive sampling process and a mode-shifting strategy for identifying cluster representatives, substantially speeds up the clustering process while being effective at capturing the large clusters in the data set. The experiments also show that SCRAPT (Sample, Cluster, Recruit, AdaPt and iTerate) is able to produce operational taxonomic units that are less fragmented than popular tools: UCLUST, CD-HIT and DNACLUST. The algorithm is implemented in the open-source package SCRAPT. The source code used to generate the results presented in this paper is available at.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] Genetic environment of 16S rRNA methylase gene rmtD
    Doi, Yohei
    Adams-Haduch, Jennifer M.
    Paterson, David L.
    ANTIMICROBIAL AGENTS AND CHEMOTHERAPY, 2008, 52 (06) : 2270 - 2272
  • [32] Transcriptional analysis of the 16S rRNA gene in Rickettsia prowazekii
    Pang, HL
    Winkler, HH
    JOURNAL OF BACTERIOLOGY, 1996, 178 (06) : 1750 - 1755
  • [33] Comparison of 16S rRNA gene sequences of genus Methanobrevibacter
    Abhijit S Dighe
    Kamlesh Jangid
    José M González
    Vyankatesh J Pidiyar
    Milind S Patole
    Dilip R Ranade
    Yogesh S Shouche
    BMC Microbiology, 4
  • [34] Characterization of Anabaena species by RFLP of the 16S rRNA gene
    Ezhilarasi, A.
    Anand, N.
    JOURNAL OF PURE AND APPLIED MICROBIOLOGY, 2009, 3 (01): : 273 - 278
  • [35] The 16S rRNA gene in the study of marine microbial communities
    Valenzuela-Gonzalez, Fabiola
    Casillas-Hernandez, Ramon
    Villalpando, Enrique
    Vargas-Albores, Francisco
    CIENCIAS MARINAS, 2015, 41 (04) : 297 - 313
  • [36] Copy number of the 16S rRNA gene in Coxiella burnetii
    Guy Afseth
    Louis P. Mallavia
    European Journal of Epidemiology, 1997, 13 : 729 - 731
  • [37] Identification ofGluconobacter strains isolated in Thailand based on 16S–23S rRNA gene ITS restriction and 16S rRNA gene sequence analyses
    Jintana Kommanee
    Ancharida Akaracharanya
    Somboon Tanasupawat
    Taweesak Malimas
    Pattaraporn Yukphan
    Yasuyoshi Nakagawa
    Yuzo Yamada
    Annals of Microbiology, 2008, 58 : 741 - 747
  • [38] Identification ofAcetobacter strains isolated in Thailand based on 16S–23S rRNA gene ITS restriction and 16S rRNA gene sequence analyses
    Jintana Kommanee
    Ancharida Akaracharanya
    Somboon Tanasupawat
    Taweesak Malimas
    Pattaraporn Yukphan
    Yasuyoshi Nakagawa
    Yuzo Yamada
    Annals of Microbiology, 2008, 58 : 319 - 324
  • [39] Acquisition of 16S rRNA methylase gene in Pseudomonas aeruginosa
    Yokoyama, K
    Doi, Y
    Yamane, K
    Kurokawa, H
    Shibata, N
    Shibayama, K
    Yagi, T
    Kato, H
    Arakawa, Y
    LANCET, 2003, 362 (9399): : 1888 - 1893
  • [40] Phylogeny of hagfish based on the mitochondrial 16S rRNA gene
    Kuo, CH
    Huang, S
    Lee, SC
    MOLECULAR PHYLOGENETICS AND EVOLUTION, 2003, 28 (03) : 448 - 457