CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers

被引:366
|
作者
Ounit, Rachid [1 ]
Wanamaker, Steve [2 ]
Close, Timothy J. [2 ]
Lonardi, Stefano [1 ]
机构
[1] Univ Calif Riverside, Dept Comp Sci & Engn, Riverside, CA 92521 USA
[2] Univ Calif Riverside, Dept Plant & Bot Sci, Riverside, CA 92521 USA
来源
BMC GENOMICS | 2015年 / 16卷
基金
美国国家科学基金会;
关键词
Metagenomics; Genomics; Arm/chromosome assignments; Discriminative k-mers; Sequence-specific k-mers; Chromosome arm; Centromere; IDENTIFICATION;
D O I
10.1186/s12864-015-1419-2
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: The problem of supervised DNA sequence classification arises in several fields of computational molecular biology. Although this problem has been extensively studied, it is still computationally challenging due to size of the datasets that modern sequencing technologies can produce. Results: We introduce CLARK a novel approach to classify metagenomic reads at the species or genus level with high accuracy and high speed. Extensive experimental results on various metagenomic samples show that the classification accuracy of CLARK is better or comparable to the best state-of-the-art tools and it is significantly faster than any of its competitors. In its fastest single-threaded mode CLARK classifies, with high accuracy, about 32 million metagenomic short reads per minute. CLARK can also classify BAC clones or transcripts to chromosome arms and centromeric regions. Conclusions: CLARK is a versatile, fast and accurate sequence classification method, especially useful for metagenomics and genomics applications. It is freely available at http://clark.cs.ucr.edu/.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] MicroRNA categorization using sequence motifs and k-mers
    Malik Yousef
    Waleed Khalifa
    İlhan Erkin Acar
    Jens Allmer
    BMC Bioinformatics, 18
  • [32] Phenetic Comparison of Prokaryotic Genomes Using k-mers
    Deraspe, Maxime
    Raymond, Frederic
    Boisvert, Sebastien
    Culley, Alexander
    Roy, Paul H.
    Laviolette, Francois
    Corbeil, Jacques
    MOLECULAR BIOLOGY AND EVOLUTION, 2017, 34 (10) : 2716 - 2729
  • [33] Predicting bacterial resistance from whole-genome sequences using k-mers and stability selection
    Pierre Mahé
    Maud Tournoud
    BMC Bioinformatics, 19
  • [34] Mining statistically-solid k-mers for accurate NGS error correction
    Liang Zhao
    Jin Xie
    Lin Bai
    Wen Chen
    Mingju Wang
    Zhonglei Zhang
    Yiqi Wang
    Zhe Zhao
    Jinyan Li
    BMC Genomics, 19
  • [35] Model for the distributions of k-mers in DNA sequences -: art. no. 011908
    Chen, YH
    Nyeo, SL
    Yeh, CY
    PHYSICAL REVIEW E, 2005, 72 (01):
  • [36] Predicting bacterial resistance from whole-genome sequences using k-mers and stability selection
    Mahe, Pierre
    Tournoud, Maud
    BMC BIOINFORMATICS, 2018, 19
  • [37] Real Time Metagenomics: Using k-mers to annotate metagenomes
    Edwards, Robert A.
    Olson, Robert
    Disz, Terry
    Pusch, Gordon D.
    Vonstein, Veronika
    Stevens, Rick
    Overbeek, Ross
    BIOINFORMATICS, 2012, 28 (24) : 3316 - 3317
  • [38] A classification model for lncRNA and mRNA based on k-mers and a convolutional neural network
    Wen, Jianghui
    Liu, Yeshu
    Shi, Yu
    Huang, Haoran
    Deng, Bing
    Xiao, Xinping
    BMC BIOINFORMATICS, 2019, 20 (01) : 469
  • [39] Association mapping from sequencing reads using k-mers
    Rahman, Atif
    Hallgrimsdottir, Ingileif
    Eisen, Michael
    Pachter, Lior
    ELIFE, 2018, 7
  • [40] Robust k-mer frequency estimation using gapped k-mers
    Ghandi, Mahmoud
    Mohammad-Noori, Morteza
    Beer, Michael A.
    JOURNAL OF MATHEMATICAL BIOLOGY, 2014, 69 (02) : 469 - 500