Feature Selection in Single-Cell RNA-seq Data via a Genetic Algorithm

被引:3
|
作者
Chatzilygeroudis, Konstantinos I. [1 ,2 ]
Vrahatis, Aristidis G. [3 ]
Tasoulis, Sotiris K. [3 ]
Vrahatis, Michael N. [2 ]
机构
[1] Univ Patras, CEID, Patras, Greece
[2] Univ Patras, Dept Math, Computat Intelligence Lab, Patras, Greece
[3] Univ Thessaly, Dept Comp Sci & Biomed Informat, Volos, Greece
关键词
Feature selection; Optimization; Single-cell RNA-seq; High-dimensional data; EXPRESSION DATA; CLASSIFICATION; KERNEL;
D O I
10.1007/978-3-030-92121-7_6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Big data methods prevail in the biomedical domain leading to effective and scalable data-driven approaches. Biomedical data are known for their ultra-high dimensionality, especially the ones coming from molecular biology experiments. This property is also included in the emerging technique of single-cell RNA-sequencing (scRNA-seq), where we obtain sequence information from individual cells. A reliable way to uncover their complexity is by using Machine Learning approaches, including dimensional reduction and feature selection methods. Although the first choice has had remarkable progress in scRNA-seq data, only the latter can offer deeper interpretability at the gene level since it highlights the dominant gene features in the given data. Towards tackling this challenge, we propose a feature selection framework that utilizes genetic optimization principles and identifies low-dimensional combinations of gene lists in order to enhance classification performance of any off-the-shelf classifier (e.g., LDA or SVM). Our intuition is that by identifying an optimal genes subset, we can enhance the prediction power of scRNA-seq data even if these genes are unrelated to each other. We showcase our proposed framework's effectiveness in two real scRNA-seq experiments with gene dimensions up to 36708. Our framework can identify very low-dimensional subsets of genes (less than 200) while boosting the classifiers' performance. Finally, we provide a biological interpretation of the selected genes, thus providing evidence of our method's utility towards explainable artificial intelligence.
引用
收藏
页码:66 / 79
页数:14
相关论文
共 50 条
  • [1] Gene Selection for Single-cell RNA-seq Data Based on Information Gain and Genetic Algorithm
    Zhang, Jie
    Feng, Junhong
    2018 14TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY (CIS), 2018, : 57 - 61
  • [2] Clustering of Small-Sample Single-Cell RNA-Seq Data via Feature Clustering and Selection
    Vans, Edwin
    Sharma, Alok
    Patil, Ashwini
    Shigemizu, Daichi
    Tsunoda, Tatsuhiko
    PRICAI 2019: TRENDS IN ARTIFICIAL INTELLIGENCE, PT III, 2019, 11672 : 445 - 456
  • [3] scFseCluster: a feature selection-enhanced clustering for single-cell RNA-seq data
    Wang, Zongqin
    Xie, Xiaojun
    Liu, Shouyang
    Ji, Zhiwei
    LIFE SCIENCE ALLIANCE, 2023, 6 (12)
  • [4] FEATS: feature selection-based clustering of single-cell RNA-seq data
    Vans, Edwin
    Patil, Ashwini
    Sharma, Alok
    BRIEFINGS IN BIOINFORMATICS, 2021, 22 (04)
  • [5] Crafted experiments to evaluate feature selection methods for single-cell RNA-seq data
    Liu, Siyao
    Corcoran, David L.
    Garcia-Recio, Susana
    Marron, James S.
    Perou, Charles M.
    NAR GENOMICS AND BIOINFORMATICS, 2025, 7 (01)
  • [6] Accurate feature selection improves single-cell RNA-seq cell clustering
    Su, Kenong
    Yu, Tianwei
    Wu, Hao
    BRIEFINGS IN BIOINFORMATICS, 2021, 22 (05)
  • [7] scFSNN: a feature selection method based on neural network for single-cell RNA-seq data
    Minjiao Peng
    Baoqin Lin
    Jun Zhang
    Yan Zhou
    Bingqing Lin
    BMC Genomics, 25
  • [8] scFSNN: a feature selection method based on neural network for single-cell RNA-seq data
    Peng, Minjiao
    Lin, Baoqin
    Zhang, Jun
    Zhou, Yan
    Lin, Bingqing
    BMC GENOMICS, 2024, 25 (01)
  • [9] Tumor genetic analysis from single-cell RNA-seq data
    Nawy, Tal
    NATURE METHODS, 2018, 15 (07) : 571 - 571
  • [10] Tumor genetic analysis from single-cell RNA-seq data
    Tal Nawy
    Nature Methods, 2018, 15 : 571 - 571