DectICO: an alignment-free supervised metagenomic classification method based on feature extraction and dynamic selection

被引:7
|
作者
Ding, Xiao [1 ]
Cheng, Fudong [1 ]
Cao, Changchang [1 ]
Sun, Xiao [1 ]
机构
[1] Southeast Univ, Sch Biol Sci & Med Engn, State Key Lab Bioelect, Nanjing 210096, Jiangsu, Peoples R China
来源
BMC BIOINFORMATICS | 2015年 / 16卷
基金
中国国家自然科学基金;
关键词
Alignment-free; Metagenome; Classification; Sequence feature; Feature selection; RNA GENE DATABASE; GUT MICROBIOTA; ALGORITHM;
D O I
10.1186/s12859-015-0753-3
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Continual progress in next-generation sequencing allows for generating increasingly large metagenomes which are over time or space. Comparing and classifying the metagenomes with different microbial communities is critical. Alignment-free supervised classification is important for discriminating between the multifarious components of metagenomic samples, because it can be accomplished independently of known microbial genomes. Results: We propose an alignment-free supervised metagenomic classification method called DectICO. The intrinsic correlation of oligonucleotides provides the feature set, which is selected dynamically using a kernel partial least squares algorithm, and the feature matrices extracted with this set are sequentially employed to train classifiers by support vector machine (SVM). We evaluated the classification performance of DectICO on three actual metagenomic sequencing datasets, two containing deep sequencing metagenomes and one of low coverage. Validation results show that DectICO is powerful, performs well based on long oligonucleotides (i.e., 6-mer to 8-mer), and is more stable and generalized than a sequence-composition-based method. The classifiers trained by our method are more accurate than non-dynamic feature selection methods and a recently published recursive-SVM-based classification approach. Conclusions: The alignment-free supervised classification method DectICO can accurately classify metagenomic samples without dependence on known microbial genomes. Selecting the ICO dynamically offers better stability and generality compared with sequence-composition-based classification algorithms. Our proposed method provides new insights in metagenomic sample classification.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] A Jaya algorithm based wrapper method for optimal feature selection in supervised classification
    Das, Himansu
    Naik, Bighnaraj
    Behera, H. S.
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2022, 34 (06) : 3851 - 3863
  • [22] Classification of Protein Sequences by a Novel Alignment-Free Method on Bacterial and Virus Families
    Guan, Mengcen
    Zhao, Leqi
    Yau, Stephen S-T
    GENES, 2022, 13 (10)
  • [23] A signal processing method for alignment-free metagenomic binning: multi-resolution genomic binary patterns
    Samaneh Kouchaki
    Avraam Tapinos
    David L. Robertson
    Scientific Reports, 9
  • [24] A signal processing method for alignment-free metagenomic binning: multi-resolution genomic binary patterns
    Kouchaki, Samaneh
    Tapinos, Avraam
    Robertson, David L.
    SCIENTIFIC REPORTS, 2019, 9 (1)
  • [25] A Fast Supervised Method of Feature Ranking and Selection for Pattern Classification
    Samanta, Suranjana
    Das, Sukhendu
    PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PROCEEDINGS, 2009, 5909 : 80 - 85
  • [26] Weighted joint sparse representation-based classification method for robust alignment-free face recognition
    Sun, Bo
    Xu, Feng
    Zhou, Guoyan
    He, Jun
    Ge, Fengxiang
    JOURNAL OF ELECTRONIC IMAGING, 2015, 24 (01)
  • [27] Integrating alignment-based and alignment-free sequence similarity measures for biological sequence classification
    Borozan, Ivan
    Watt, Stuart
    Ferretti, Vincent
    BIOINFORMATICS, 2015, 31 (09) : 1396 - 1404
  • [28] Energy Data Catalog Item Extraction Method Based on Semi Supervised Feature Selection
    Wei, Zhen
    Ye, Rong
    Chen, Zhuolin
    Zhang, Zhanghuang
    Zheng, Huan
    Li, Zhenwei
    2021 IEEE IAS INDUSTRIAL AND COMMERCIAL POWER SYSTEM ASIA (IEEE I&CPS ASIA 2021), 2021, : 581 - 585
  • [29] Normalized Feature Vectors: A Novel Alignment-Free Sequence Comparison Method Based on the Numbers of Adjacent Amino Acids
    Huang, De-Shuang
    Yu, Hong-Jie
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2013, 10 (02) : 457 - 467
  • [30] Patch-Set-Based Representation for Alignment-Free Image Set Classification
    Gao, Shenghua
    Zeng, Zinan
    Jia, Kui
    Chan, Tsung-Han
    Tang, Jinhui
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2016, 26 (09) : 1646 - 1658