CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers

被引:366
|
作者
Ounit, Rachid [1 ]
Wanamaker, Steve [2 ]
Close, Timothy J. [2 ]
Lonardi, Stefano [1 ]
机构
[1] Univ Calif Riverside, Dept Comp Sci & Engn, Riverside, CA 92521 USA
[2] Univ Calif Riverside, Dept Plant & Bot Sci, Riverside, CA 92521 USA
来源
BMC GENOMICS | 2015年 / 16卷
基金
美国国家科学基金会;
关键词
Metagenomics; Genomics; Arm/chromosome assignments; Discriminative k-mers; Sequence-specific k-mers; Chromosome arm; Centromere; IDENTIFICATION;
D O I
10.1186/s12864-015-1419-2
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: The problem of supervised DNA sequence classification arises in several fields of computational molecular biology. Although this problem has been extensively studied, it is still computationally challenging due to size of the datasets that modern sequencing technologies can produce. Results: We introduce CLARK a novel approach to classify metagenomic reads at the species or genus level with high accuracy and high speed. Extensive experimental results on various metagenomic samples show that the classification accuracy of CLARK is better or comparable to the best state-of-the-art tools and it is significantly faster than any of its competitors. In its fastest single-threaded mode CLARK classifies, with high accuracy, about 32 million metagenomic short reads per minute. CLARK can also classify BAC clones or transcripts to chromosome arms and centromeric regions. Conclusions: CLARK is a versatile, fast and accurate sequence classification method, especially useful for metagenomics and genomics applications. It is freely available at http://clark.cs.ucr.edu/.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] MetaCon: unsupervised clustering of metagenomic contigs with probabilistic k-mers statistics and coverage
    Jia Qian
    Matteo Comin
    BMC Bioinformatics, 20
  • [22] MetaCon: unsupervised clustering of metagenomic contigs with probabilistic k-mers statistics and coverage
    Qian, Jia
    Comin, Matteo
    BMC BIOINFORMATICS, 2019, 20 (Suppl 9)
  • [23] A New Feature Selection Methodology for K-mers Representation of DNA Sequences
    Giosue, Lo Bosco
    Luca, Pinello
    COMPUTATIONAL INTELLIGENCE METHODS FOR BIOINFORMATICS AND BIOSTATISTICS, CIBB 2014, 2015, 8623 : 99 - 108
  • [24] Fast and Accurate Taxonomic Assignments of Metagenomic Sequences Using MetaBin
    Sharma, Vineet K.
    Kumar, Naveen
    Prakash, Tulika
    Taylor, Todd D.
    PLOS ONE, 2012, 7 (04):
  • [25] Efficient Mining Closed k-mers from DNA and Protein Sequences
    Zhang, Jingsong
    Bi, Cheng
    Wang, Yinglin
    Zeng, Tao
    Liao, Bo
    Chen, Luonan
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP 2020), 2020, : 342 - 349
  • [26] KmerGO: A Tool to Identify Group-Specific Sequences With k-mers
    Wang, Ying
    Chen, Qi
    Deng, Chao
    Zheng, Yiluan
    Sun, Fengzhu
    FRONTIERS IN MICROBIOLOGY, 2020, 11
  • [27] Extraction of Long k-mers Using Spaced Seeds
    Leinonen, Miika
    Salmela, Leena
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2022, 19 (06) : 3444 - 3455
  • [28] MicroRNA categorization using sequence motifs and k-mers
    Yousef, Malik
    Khalifa, Waleed
    Acar, Ilhan Erkin
    Allmer, Jens
    BMC BIOINFORMATICS, 2017, 18
  • [29] Mining statistically-solid k-mers for accurate NGS error correction
    Zhao, Liang
    Xie, Jin
    Bai, Lin
    Chen, Wen
    Wang, Mingju
    Zhang, Zhonglei
    Wang, Yiqi
    Zhao, Zhe
    Li, Jinyan
    BMC GENOMICS, 2018, 19
  • [30] Plasmer: an Accurate and Sensitive Bacterial Plasmid Prediction Tool Based on Machine Learning of Shared k-mers and Genomic Features
    Zhu, Qianhui
    Gao, Shenghan
    Xiao, Binghan
    He, Zilong
    Hu, Songnian
    MICROBIOLOGY SPECTRUM, 2023, 11 (03):