Feature Selection in Single-Cell RNA-seq Data via a Genetic Algorithm

被引:3
|
作者
Chatzilygeroudis, Konstantinos I. [1 ,2 ]
Vrahatis, Aristidis G. [3 ]
Tasoulis, Sotiris K. [3 ]
Vrahatis, Michael N. [2 ]
机构
[1] Univ Patras, CEID, Patras, Greece
[2] Univ Patras, Dept Math, Computat Intelligence Lab, Patras, Greece
[3] Univ Thessaly, Dept Comp Sci & Biomed Informat, Volos, Greece
关键词
Feature selection; Optimization; Single-cell RNA-seq; High-dimensional data; EXPRESSION DATA; CLASSIFICATION; KERNEL;
D O I
10.1007/978-3-030-92121-7_6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Big data methods prevail in the biomedical domain leading to effective and scalable data-driven approaches. Biomedical data are known for their ultra-high dimensionality, especially the ones coming from molecular biology experiments. This property is also included in the emerging technique of single-cell RNA-sequencing (scRNA-seq), where we obtain sequence information from individual cells. A reliable way to uncover their complexity is by using Machine Learning approaches, including dimensional reduction and feature selection methods. Although the first choice has had remarkable progress in scRNA-seq data, only the latter can offer deeper interpretability at the gene level since it highlights the dominant gene features in the given data. Towards tackling this challenge, we propose a feature selection framework that utilizes genetic optimization principles and identifies low-dimensional combinations of gene lists in order to enhance classification performance of any off-the-shelf classifier (e.g., LDA or SVM). Our intuition is that by identifying an optimal genes subset, we can enhance the prediction power of scRNA-seq data even if these genes are unrelated to each other. We showcase our proposed framework's effectiveness in two real scRNA-seq experiments with gene dimensions up to 36708. Our framework can identify very low-dimensional subsets of genes (less than 200) while boosting the classifiers' performance. Finally, we provide a biological interpretation of the selected genes, thus providing evidence of our method's utility towards explainable artificial intelligence.
引用
收藏
页码:66 / 79
页数:14
相关论文
共 50 条
  • [21] Deterministic column subset selection for single-cell RNA-Seq
    McCurdy, Shannon R.
    Ntranos, Vasilis
    Pachter, Lior
    PLOS ONE, 2019, 14 (01):
  • [22] Ensemble dimensionality reduction and feature gene extraction for single-cell RNA-seq data
    Xiaoxiao Sun
    Yiwen Liu
    Lingling An
    Nature Communications, 11
  • [23] Optimal Gene Filtering for Single-Cell data (OGFSC)-a gene filtering algorithm for single-cell RNA-seq data
    Hao, Jie
    Cao, Wei
    Huang, Jian
    Zou, Xin
    Han, Ze-Guang
    BIOINFORMATICS, 2019, 35 (15) : 2602 - 2609
  • [24] Ensemble dimensionality reduction and feature gene extraction for single-cell RNA-seq data
    Sun, Xiaoxiao
    Liu, Yiwen
    An, Lingling
    NATURE COMMUNICATIONS, 2020, 11 (01)
  • [25] scRFR: imputation of single-cell RNA-seq data based on recurrent feature inference
    Zhu, Bangyu
    Zhang, Shaoqiang
    Li, Lixuan
    Qian, Zhizhong
    PROCEEDINGS OF 2024 4TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND INTELLIGENT COMPUTING, BIC 2024, 2024, : 420 - 424
  • [26] Crafted experiments to evaluate feature selection methods for single cell RNA-seq data
    Liu, Siyao
    Corcoran, David
    Garcia-Recio, Susana
    Perou, Charles
    Marron, J. S.
    CANCER RESEARCH, 2024, 84 (07)
  • [27] Computational analysis of alternative polyadenylation from standard RNA-seq and single-cell RNA-seq data
    Gao, Yipeng
    Li, Wei
    MRNA 3' END PROCESSING AND METABOLISM, 2021, 655 : 225 - 243
  • [28] Analysis of Single-Cell RNA-seq Data by Clustering Approaches
    Zhu, Xiaoshu
    Li, Hong-Dong
    Guo, Lilu
    Wu, Fang-Xiang
    Wang, Jianxin
    CURRENT BIOINFORMATICS, 2019, 14 (04) : 314 - 322
  • [29] Evaluating imputation methods for single-cell RNA-seq data
    Yi Cheng
    Xiuli Ma
    Lang Yuan
    Zhaoguo Sun
    Pingzhang Wang
    BMC Bioinformatics, 24
  • [30] A Hybrid Clustering Algorithm for Identifying Cell Types from Single-Cell RNA-Seq Data
    Zhu, Xiaoshu
    Li, Hong-Dong
    Xu, Yunpei
    Guo, Lilu
    Wu, Fang-Xiang
    Duan, Guihua
    Wang, Jianxin
    GENES, 2019, 10 (02)