CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers

被引:366
|
作者
Ounit, Rachid [1 ]
Wanamaker, Steve [2 ]
Close, Timothy J. [2 ]
Lonardi, Stefano [1 ]
机构
[1] Univ Calif Riverside, Dept Comp Sci & Engn, Riverside, CA 92521 USA
[2] Univ Calif Riverside, Dept Plant & Bot Sci, Riverside, CA 92521 USA
来源
BMC GENOMICS | 2015年 / 16卷
基金
美国国家科学基金会;
关键词
Metagenomics; Genomics; Arm/chromosome assignments; Discriminative k-mers; Sequence-specific k-mers; Chromosome arm; Centromere; IDENTIFICATION;
D O I
10.1186/s12864-015-1419-2
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: The problem of supervised DNA sequence classification arises in several fields of computational molecular biology. Although this problem has been extensively studied, it is still computationally challenging due to size of the datasets that modern sequencing technologies can produce. Results: We introduce CLARK a novel approach to classify metagenomic reads at the species or genus level with high accuracy and high speed. Extensive experimental results on various metagenomic samples show that the classification accuracy of CLARK is better or comparable to the best state-of-the-art tools and it is significantly faster than any of its competitors. In its fastest single-threaded mode CLARK classifies, with high accuracy, about 32 million metagenomic short reads per minute. CLARK can also classify BAC clones or transcripts to chromosome arms and centromeric regions. Conclusions: CLARK is a versatile, fast and accurate sequence classification method, especially useful for metagenomics and genomics applications. It is freely available at http://clark.cs.ucr.edu/.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers
    Rachid Ounit
    Steve Wanamaker
    Timothy J Close
    Stefano Lonardi
    BMC Genomics, 16
  • [2] Higher Classification Accuracy of Short Metagenomic Reads by Discriminative Spaced k-mers
    Ounit, Rachid
    Lonardi, Stefano
    ALGORITHMS IN BIOINFORMATICS (WABI 2015), 2015, 9289 : 286 - 295
  • [3] Mining Discriminative K-Mers in DNA Sequences Using Sketches and Hardware Acceleration
    Saavedra, Antonio
    Lehnert, Hans
    Hernandez, Cecilia
    Carvajal, Gonzalo
    Figueroa, Miguel
    IEEE ACCESS, 2020, 8 : 114715 - 114732
  • [4] K2Mem: Discovering Discriminative K-mers From Sequencing Data for Metagenomic Reads Classification
    Storato, Davide
    Comin, Matteo
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2022, 19 (01) : 220 - 229
  • [5] SKraken: Fast and Sensitive Classification of Short Metagenomic Reads based on Filtering Uninformative k-mers
    Marchiori, Davide
    Comin, Matteo
    PROCEEDINGS OF THE 10TH INTERNATIONAL JOINT CONFERENCE ON BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES, VOL 3: BIOINFORMATICS, 2017, : 59 - 67
  • [6] CDKAM: a taxonomic classification tool using discriminative k-mers and approximate matching strategies
    Bui, Van-Kien
    Wei, Chaochun
    BMC BIOINFORMATICS, 2020, 21 (01)
  • [7] CDKAM: a taxonomic classification tool using discriminative k-mers and approximate matching strategies
    Van-Kien Bui
    Chaochun Wei
    BMC Bioinformatics, 21
  • [8] CoMeta: Classification of Metagenomes Using k-mers
    Kawulok, Jolanta
    Deorowicz, Sebastian
    PLOS ONE, 2015, 10 (04):
  • [9] Estimating the total genome length of a metagenomic sample using k-mers
    Kui Hua
    Xuegong Zhang
    BMC Genomics, 20
  • [10] Estimating the total genome length of a metagenomic sample using k-mers
    Hua, Kui
    Zhang, Xuegong
    BMC GENOMICS, 2019, 20 (Suppl 2)