Improving taxonomic classification with feature space balancing

被引:0
|
作者
Fuhl, Wolfgang [1 ]
Zabel, Susanne [1 ]
Nieselt, Kay [1 ]
机构
[1] Univ Tubingen, Inst Biomed Informat IBMI, Sand 14, D-72076 Tubingen, Baden Wurttembe, Germany
来源
BIOINFORMATICS ADVANCES | 2023年 / 3卷 / 01期
关键词
METAGENOMICS;
D O I
10.1093/bioadv/vbad092
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Modern high-throughput sequencing technologies, such as metagenomic sequencing, generate millions of sequences that need to be assigned to their taxonomic rank. Modern approaches either apply local alignment to existing databases, such as MMseqs2, or use deep neural networks, as in DeepMicrobes and BERTax. Due to the increasing size of datasets and databases, alignment-based approaches are expensive in terms of runtime. Deep learning-based approaches can require specialized hardware and consume large amounts of energy. In this article, we propose to use k-mer profiles of DNA sequences as features for taxonomic classification. Although k-mer profiles have been used before, we were able to significantly increase their predictive power significantly by applying a feature space balancing approach to the training data. This greatly improved the generalization quality of the classifiers. We have implemented different pipelines using our proposed feature extraction and dataset balancing in combination with different simple classifiers, such as bagged decision trees or feature subspace KNNs. By comparing the performance of our pipelines with state-of-the-art algorithms, such as BERTax and MMseqs2 on two different datasets, we show that our pipelines outperform these in almost all classification tasks. In particular, sequences from organisms that were not part of the training were classified with high precision.
引用
收藏
页数:7
相关论文
共 50 条
  • [31] Feature Selection Algorithm for Improving the Performance of Classification: A Survey
    Naidu, Kajal
    Dhenge, Aparna
    Wankhade, Kapil
    2014 FOURTH INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS AND NETWORK TECHNOLOGIES (CSNT), 2014, : 468 - 471
  • [32] IMPROVING GENDER CLASSIFICATION WITH FEATURE SELECTION IN FORENSIC ANTHROPOLOGY
    Hairuddin, Nurul Liyana
    Yusuf, Lizawati Mi
    Othman, Mohd Shahizan
    Majid, Hairudin Abdul
    JURNAL TEKNOLOGI, 2016, 78 (12-2): : 57 - 63
  • [33] EVALUATE DISSIMILARITY OF SAMPLES IN FEATURE SPACE FOR IMPROVING KPCA
    Xu, Yong
    Zhang, David
    Yang, Jian
    Jin, Zhong
    Yang, Jingyu
    INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & DECISION MAKING, 2011, 10 (03) : 479 - 495
  • [34] Improving KPCA Online Extraction by Orthonormalization in the Feature Space
    Souza Filho, Joao B. O.
    Diniz, Paulo S. R.
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (04) : 1382 - 1387
  • [35] Joint feature selection and classification for taxonomic problems within fish species complexes
    Chen, Yixin
    Huang, Shuqing
    Chen, Huimin
    Bart, Henry L., Jr.
    PATTERN ANALYSIS AND APPLICATIONS, 2010, 13 (01) : 23 - 34
  • [36] Joint feature selection and classification for taxonomic problems within fish species complexes
    Yixin Chen
    Shuqing Huang
    Huimin Chen
    Henry L. Bart
    Pattern Analysis and Applications, 2010, 13 : 23 - 34
  • [37] CLUSTERING AND CLASSIFICATION THROUGH NORMALIZING FLOWS IN FEATURE SPACE
    Agnelli, J. P.
    Cadeiras, M.
    Tabak, E. G.
    Turner, C. V.
    Vanden-Eijnden, E.
    MULTISCALE MODELING & SIMULATION, 2010, 8 (05): : 1784 - 1802
  • [38] Weighted Feature Space Representation with Kernel for Image Classification
    Qin, Yongbin
    Tian, Chunwei
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2018, 43 (12) : 7113 - 7125
  • [39] Air target classification in two dimensional feature space
    Golmoliammad, Hassan
    Bolandi, Hossein
    Saberi, Farhad Fani
    2006 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL TECHNOLOGY, VOLS 1-6, 2006, : 1318 - +
  • [40] Feature extraction on local jet space for texture classification
    da Silva Oliveira, Marcos William
    da Silva, Nubia Rosa
    Manzanera, Antoine
    Bruno, Odernir Martinez
    PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2015, 439 : 160 - 170