DectICO: an alignment-free supervised metagenomic classification method based on feature extraction and dynamic selection

被引:7
|
作者
Ding, Xiao [1 ]
Cheng, Fudong [1 ]
Cao, Changchang [1 ]
Sun, Xiao [1 ]
机构
[1] Southeast Univ, Sch Biol Sci & Med Engn, State Key Lab Bioelect, Nanjing 210096, Jiangsu, Peoples R China
来源
BMC BIOINFORMATICS | 2015年 / 16卷
基金
中国国家自然科学基金;
关键词
Alignment-free; Metagenome; Classification; Sequence feature; Feature selection; RNA GENE DATABASE; GUT MICROBIOTA; ALGORITHM;
D O I
10.1186/s12859-015-0753-3
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Continual progress in next-generation sequencing allows for generating increasingly large metagenomes which are over time or space. Comparing and classifying the metagenomes with different microbial communities is critical. Alignment-free supervised classification is important for discriminating between the multifarious components of metagenomic samples, because it can be accomplished independently of known microbial genomes. Results: We propose an alignment-free supervised metagenomic classification method called DectICO. The intrinsic correlation of oligonucleotides provides the feature set, which is selected dynamically using a kernel partial least squares algorithm, and the feature matrices extracted with this set are sequentially employed to train classifiers by support vector machine (SVM). We evaluated the classification performance of DectICO on three actual metagenomic sequencing datasets, two containing deep sequencing metagenomes and one of low coverage. Validation results show that DectICO is powerful, performs well based on long oligonucleotides (i.e., 6-mer to 8-mer), and is more stable and generalized than a sequence-composition-based method. The classifiers trained by our method are more accurate than non-dynamic feature selection methods and a recently published recursive-SVM-based classification approach. Conclusions: The alignment-free supervised classification method DectICO can accurately classify metagenomic samples without dependence on known microbial genomes. Selecting the ICO dynamically offers better stability and generality compared with sequence-composition-based classification algorithms. Our proposed method provides new insights in metagenomic sample classification.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] DectICO: an alignment-free supervised metagenomic classification method based on feature extraction and dynamic selection
    Xiao Ding
    Fudong Cheng
    Changchang Cao
    Xiao Sun
    BMC Bioinformatics, 16
  • [2] Toward an Alignment-Free Method for Feature Extraction and Accurate Classification of Viral Sequences
    Lebatteux, Dylan
    Remita, Amine M.
    Diallo, Abdoulaye Banire
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2019, 26 (06) : 519 - 535
  • [3] A Predictive Alignment-free Method based on Logistic Regression for Feature Selection and Classification of Protein Sequences
    Goncalves Marinho Couto, Braulio Roberto
    Santoro, Marcelo Matos
    Ladeira, Ana Paula
    dos Santos, Marcos A.
    BIOINFORMATICS 2013: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON BIOINFORMATICS MODELS, METHODS AND ALGORITHMS, 2013, : 171 - 177
  • [4] Alignment-free supervised classification of metagenomes by recursive SVM
    Hongfei Cui
    Xuegong Zhang
    BMC Genomics, 14
  • [5] Alignment-free supervised classification of metagenomes by recursive SVM
    Cui, Hongfei
    Zhang, Xuegong
    BMC GENOMICS, 2013, 14
  • [6] An alignment-free method for classification of protein sequences
    Deshmukh, Sandeep
    Khaitan, Sanjeet
    Das, Debasish
    Gupta, Manish
    Wangikar, Pramod P.
    PROTEIN AND PEPTIDE LETTERS, 2007, 14 (07): : 647 - 657
  • [7] Local Binary Patterns as a Feature Descriptor in Alignment-Free Visualisation of Metagenomic Data
    Kouchaki, Samaneh
    Tirunagari, Santosh
    Tapinos, Avraam
    Robertson, David L.
    PROCEEDINGS OF 2016 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2016,
  • [8] A novel chemical property-based, alignment-free scalable feature extraction method for genomic data clustering
    Dwivedi, Rajesh
    Tiwari, Aruna
    Bharill, Neha
    Ratnaparkhe, Milind
    Singh, Saurabh Kumar
    Tripathi, Abhishek
    COMPUTERS & ELECTRICAL ENGINEERING, 2025, 123
  • [9] POSMM: an efficient alignment-free metagenomic profiler that complements alignment-based profiling
    David J. Burks
    Vaidehi Pusadkar
    Rajeev K. Azad
    Environmental Microbiome, 18
  • [10] POSMM: an efficient alignment-free metagenomic profiler that complements alignment-based profiling
    Burks, David J.
    Pusadkar, Vaidehi
    Azad, Rajeev K.
    ENVIRONMENTAL MICROBIOME, 2023, 18 (01)