CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers

被引:366
|
作者
Ounit, Rachid [1 ]
Wanamaker, Steve [2 ]
Close, Timothy J. [2 ]
Lonardi, Stefano [1 ]
机构
[1] Univ Calif Riverside, Dept Comp Sci & Engn, Riverside, CA 92521 USA
[2] Univ Calif Riverside, Dept Plant & Bot Sci, Riverside, CA 92521 USA
来源
BMC GENOMICS | 2015年 / 16卷
基金
美国国家科学基金会;
关键词
Metagenomics; Genomics; Arm/chromosome assignments; Discriminative k-mers; Sequence-specific k-mers; Chromosome arm; Centromere; IDENTIFICATION;
D O I
10.1186/s12864-015-1419-2
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: The problem of supervised DNA sequence classification arises in several fields of computational molecular biology. Although this problem has been extensively studied, it is still computationally challenging due to size of the datasets that modern sequencing technologies can produce. Results: We introduce CLARK a novel approach to classify metagenomic reads at the species or genus level with high accuracy and high speed. Extensive experimental results on various metagenomic samples show that the classification accuracy of CLARK is better or comparable to the best state-of-the-art tools and it is significantly faster than any of its competitors. In its fastest single-threaded mode CLARK classifies, with high accuracy, about 32 million metagenomic short reads per minute. CLARK can also classify BAC clones or transcripts to chromosome arms and centromeric regions. Conclusions: CLARK is a versatile, fast and accurate sequence classification method, especially useful for metagenomics and genomics applications. It is freely available at http://clark.cs.ucr.edu/.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Syncmers are more sensitive than minimizers for selecting conserved k-mers in biological sequences
    Edgar, Robert
    PEERJ, 2021, 9
  • [42] A classification model for lncRNA and mRNA based on k-mers and a convolutional neural network
    Jianghui Wen
    Yeshu Liu
    Yu Shi
    Haoran Huang
    Bing Deng
    Xinping Xiao
    BMC Bioinformatics, 20
  • [43] Joker de Bruijn: Covering k-Mers Using Joker Characters
    Orenstein, Yaron
    Yu, Yun William
    Berger, Bonnie
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2018, 25 (11) : 1171 - 1178
  • [44] Fleximer: Accurate Quantification of RNA-Seq via Variable-Length k-mers
    Ju, Chelsea J. -T.
    Li, Ruirui
    Wu, Zhengliang
    Jiang, Jyun-Yu
    Yang, Zhao
    Wang, Wei
    ACM-BCB' 2017: PROCEEDINGS OF THE 8TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY,AND HEALTH INFORMATICS, 2017, : 263 - 272
  • [45] De Novo Draft Genome Assembly Using Fuzzy K-mers
    Healy, John
    Chambers, Desmond
    BIOTECHNO 2011: THE THIRD INTERNATIONAL CONFERENCE ON BIOINFORMATICS, BIOCOMPUTATIONAL SYSTEMS AND BIOTECHNOLOGIES, 2011, : 104 - 109
  • [46] A fast, lock-free approach for efficient parallel counting of occurrences of k-mers
    Marcais, Guillaume
    Kingsford, Carl
    BIOINFORMATICS, 2011, 27 (06) : 764 - 770
  • [47] Flexible k-mers with variable-length indels for identifying binding sequences of protein dimers
    Hong, Chenyang
    Yip, Kevin Y.
    BRIEFINGS IN BIOINFORMATICS, 2020, 21 (05) : 1787 - 1797
  • [48] Scalable Genomic Assembly through Parallel de Bruijn Graph Construction for Multiple K-mers
    Mahadik, Kanak
    Wright, Christopher
    Kulkarni, Milind
    Bagchi, Saurabh
    Chaterji, Somali
    ACM-BCB' 2017: PROCEEDINGS OF THE 8TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY,AND HEALTH INFORMATICS, 2017, : 425 - 431
  • [49] DiScRIBinATE: a rapid method for accurate taxonomic classification of metagenomic sequences
    Tarini Shankar Ghosh
    Monzoorul Haque M
    Sharmila S Mande
    BMC Bioinformatics, 11
  • [50] Measuring the Invisible: The Sequences Causal of Genome Size Differences in Eyebrights (Euphrasia) Revealed by k-mers
    Becher, Hannes
    Sampson, Jacob
    Twyford, Alex D.
    FRONTIERS IN PLANT SCIENCE, 2022, 13