K2Mem: Discovering Discriminative K-mers From Sequencing Data for Metagenomic Reads Classification

被引:6
|
作者
Storato, Davide [1 ]
Comin, Matteo [2 ]
机构
[1] Univ Padua, Dept Mol Med, I-35100 Padua, Italy
[2] Univ Padua, Dept Informat Engn, I-35100 Padua, Italy
关键词
Metagenomic reads classification; discriminative k-mers; minimizers; SENSITIVE CLASSIFICATION; MAMMALIAN ENHANCERS;
D O I
10.1109/TCBB.2021.3117406
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The major problem when analyzing a metagenomic sample is to taxonomically annotate its reads to identify the species they contain. Most of the methods currently available focus on the classification of reads using a set of reference genomes and their k-mers. While in terms of precision these methods have reached percentages of correctness close to perfection, in terms of recall (the actual number of classified reads) the performances fall at around 50%. One of the reasons is the fact that the sequences in a sample can be very different from the corresponding reference genome, e.g., viral genomes are highly mutated. To address this issue, in this paper we study the problem of metagenomic reads classification by improving the reference k-mers library with novel discriminative k-mers from the input sequencing reads. We evaluated the performance in different conditions against several other tools and the results showed an improved F-measure, especially when close reference genomes are not available.
引用
收藏
页码:220 / 229
页数:10
相关论文
共 15 条
  • [1] Higher Classification Accuracy of Short Metagenomic Reads by Discriminative Spaced k-mers
    Ounit, Rachid
    Lonardi, Stefano
    ALGORITHMS IN BIOINFORMATICS (WABI 2015), 2015, 9289 : 286 - 295
  • [2] Association mapping from sequencing reads using k-mers
    Rahman, Atif
    Hallgrimsdottir, Ingileif
    Eisen, Michael
    Pachter, Lior
    ELIFE, 2018, 7
  • [3] CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers
    Rachid Ounit
    Steve Wanamaker
    Timothy J Close
    Stefano Lonardi
    BMC Genomics, 16
  • [4] CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers
    Ounit, Rachid
    Wanamaker, Steve
    Close, Timothy J.
    Lonardi, Stefano
    BMC GENOMICS, 2015, 16
  • [5] Indexing Arbitrary-Length k-Mers in Sequencing Reads
    Kowalski, Tomasz
    Grabowski, Szymon
    Deorowicz, Sebastian
    PLOS ONE, 2015, 10 (07):
  • [6] SKraken: Fast and Sensitive Classification of Short Metagenomic Reads based on Filtering Uninformative k-mers
    Marchiori, Davide
    Comin, Matteo
    PROCEEDINGS OF THE 10TH INTERNATIONAL JOINT CONFERENCE ON BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES, VOL 3: BIOINFORMATICS, 2017, : 59 - 67
  • [7] MetaProb 2: Metagenomic Reads Binning Based on Assembly Using Minimizers and K-Mers Statistics
    Andreace, Francesco
    Pizzi, Cinzia
    Comin, Matteo
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2021, 28 (11) : 1052 - 1062
  • [8] Reference-free Association Mapping from Sequencing Reads Using k-mers
    Mehrab, Zakaria
    Mobin, Jaiaid
    Tahmid, Ibrahim Asadullah
    Pachter, Lior
    Rahman, Atif
    BIO-PROTOCOL, 2020, 10 (21):
  • [9] CDKAM: a taxonomic classification tool using discriminative k-mers and approximate matching strategies
    Bui, Van-Kien
    Wei, Chaochun
    BMC BIOINFORMATICS, 2020, 21 (01)
  • [10] CDKAM: a taxonomic classification tool using discriminative k-mers and approximate matching strategies
    Van-Kien Bui
    Chaochun Wei
    BMC Bioinformatics, 21