Improving taxonomic classification with feature space balancing

被引:0
|
作者
Fuhl, Wolfgang [1 ]
Zabel, Susanne [1 ]
Nieselt, Kay [1 ]
机构
[1] Univ Tubingen, Inst Biomed Informat IBMI, Sand 14, D-72076 Tubingen, Baden Wurttembe, Germany
来源
BIOINFORMATICS ADVANCES | 2023年 / 3卷 / 01期
关键词
METAGENOMICS;
D O I
10.1093/bioadv/vbad092
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Modern high-throughput sequencing technologies, such as metagenomic sequencing, generate millions of sequences that need to be assigned to their taxonomic rank. Modern approaches either apply local alignment to existing databases, such as MMseqs2, or use deep neural networks, as in DeepMicrobes and BERTax. Due to the increasing size of datasets and databases, alignment-based approaches are expensive in terms of runtime. Deep learning-based approaches can require specialized hardware and consume large amounts of energy. In this article, we propose to use k-mer profiles of DNA sequences as features for taxonomic classification. Although k-mer profiles have been used before, we were able to significantly increase their predictive power significantly by applying a feature space balancing approach to the training data. This greatly improved the generalization quality of the classifiers. We have implemented different pipelines using our proposed feature extraction and dataset balancing in combination with different simple classifiers, such as bagged decision trees or feature subspace KNNs. By comparing the performance of our pipelines with state-of-the-art algorithms, such as BERTax and MMseqs2 on two different datasets, we show that our pipelines outperform these in almost all classification tasks. In particular, sequences from organisms that were not part of the training were classified with high precision.
引用
收藏
页数:7
相关论文
共 50 条
  • [1] Improving Short Text Classification through Better Feature Space Selection
    Wang, Meng
    Lin, Lanfen
    Wang, Feng
    2013 9TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY (CIS), 2013, : 120 - 124
  • [2] Feature Ranking Algorithms for Improving Classification of Vector Space Embedded Graphs
    Riesen, Kaspar
    Bunke, Horst
    COMPUTER ANALYSIS OF IMAGES AND PATTERNS, PROCEEDINGS, 2009, 5702 : 377 - 384
  • [3] Taxometer: Improving taxonomic classification of metagenomics contigs
    Svetlana Kutuzova
    Mads Nielsen
    Pau Piera
    Jakob Nybo Nissen
    Simon Rasmussen
    Nature Communications, 15 (1)
  • [4] Improving Classification Performance of Fully Connected Layers by Fuzzy Clustering in Transformed Feature Space
    Kalayci, Tolga Ahmet
    Asan, Umut
    SYMMETRY-BASEL, 2022, 14 (04):
  • [5] A feature space class balancing strategy-based fault classification method in solar photovoltaic modules
    Wu, Shizhen
    Kong, Yaguang
    Xu, Ruidong
    Guo, Yunfei
    Chen, Zhangping
    Zheng, Xiaoqing
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 136
  • [6] Temporal Feature Space for Text Classification
    Rizzo, Stefano Giovanni
    Montesi, Danilo
    TEXT, SPEECH, AND DIALOGUE, TSD 2017, 2017, 10415 : 362 - 370
  • [7] Partitioning of feature space for pattern classification
    Mandal, DP
    PATTERN RECOGNITION, 1997, 30 (12) : 1971 - 1990
  • [8] Nearest Feature Space Analysis for Classification
    Lu, Jiwen
    Tan, Yap-Peng
    IEEE SIGNAL PROCESSING LETTERS, 2011, 18 (01) : 55 - 58
  • [9] Improving the Classification of Nuclear Receptors with Feature Selection
    Gao, Qing-Bin
    Jin, Zhi-Chao
    Ye, Xiao-Fei
    Wu, Cheng
    Lu, Jian
    He, Jia
    PROTEIN AND PEPTIDE LETTERS, 2009, 16 (07): : 823 - 829
  • [10] Improving the Efficiency of Automatic Cardiac Arrhythmias Classification by a Novel Patient-Specific Feature Space Mapping
    Hamid Shafaatfar
    Mehdi Taghizadeh
    Morteza Valizadeh
    Mohammad Hossein Fatehi
    Circuits, Systems, and Signal Processing, 2024, 43 : 2273 - 2287