Intrinsic entropy model for feature selection of scRNA-seq data

被引:4
|
作者
Li, Lin [1 ,2 ]
Tang, Hui [3 ]
Xia, Rui [1 ,2 ]
Dai, Hao [1 ]
Liu, Rui [3 ]
Chen, Luonan [1 ,4 ,5 ,6 ]
机构
[1] Chinese Acad Sci, CAS Ctr Excellence Mol Cell Sci, Shanghai Inst Biochem & Ceti Biol, State Key Lab Cell Biol, Shanghai 200031, Peoples R China
[2] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
[3] South China Univ Technol, Sch Math, Guangzhou 510640, Peoples R China
[4] Chinese Acad Sci, Ctr Excellence Anim Evolut & Genet, Kunming 650223, Yunnan, Peoples R China
[5] Chinese Acad Sci, Univ Chinese Acad Sci, Hangzhou Inst Adv Study, Key Lab Syst Hlth Sci Zhejiang Prov, Hangzhou 310024, Peoples R China
[6] Guangdong Inst Intelligence Sci & Technol, Zhuhai 519031, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
scRNA-seq; feature selection; intrinsic entropy; extrinsic entropy; entropy decomposition; informative genes;
D O I
10.1093/jmcb/mjac008
中图分类号
Q2 [细胞生物学];
学科分类号
071009 ; 090102 ;
摘要
Recent advances of single-cell RNA sequencing (scRNA-seq) technologies have led to extensive study of cellular heterogeneity and cell-to-cell variation. However, the high frequency of dropout events and noise in scRNA-seq data confounds the accuracy of the downstream analysis, i.e. clustering analysis, whose accuracy depends heavily on the selected feature genes. Here, by deriving an entropy decomposition formula, we propose a feature selection method, i.e. an intrinsic entropy (IE) model, to identify the informative genes for accurately clustering analysis. Specifically, by eliminating the 'noisy' fluctuation or extrinsic entropy (EE), we extract the IE of each gene from the total entropy (TE), i.e. TE = IE + EE. We show that the IE of each gene actually reflects the regulatory fluctuation of this gene in a cellular process, and thus high-IE genes provide rich information on cell type or state analysis. To validate the performance of the high-IE genes, we conduct computational analysis on both simulated datasets and real single-cell datasets by comparing with other representative methods. The results show that our IE model is not only broadly applicable and robust for different clustering and classification methods, but also sensitive for novel cell types. Our results also demonstrate that the intrinsic entropy/fluctuation of a gene serves as information rather than noise in contrast to its total entropy/fluctuation.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Feature selection methods affect the performance of scRNA-seq data integration and querying
    Zappia, Luke
    Richter, Sabrina
    Ramirez-Suastegui, Ciro
    Kfuri-Rubens, Raphael
    Vornholz, Larsen
    Wang, Weixu
    Dietrich, Oliver
    Frishberg, Amit
    Luecken, Malte D.
    Theis, Fabian J.
    NATURE METHODS, 2025, 22 (04)
  • [2] FSCAM: CAM-Based Feature Selection for Clustering scRNA-seq
    Wang, Yan
    Gao, Jie
    Xuan, Chenxu
    Guan, Tianhao
    Wang, Yujie
    Zhou, Gang
    Ding, Tao
    INTERDISCIPLINARY SCIENCES-COMPUTATIONAL LIFE SCIENCES, 2022, 14 (02) : 394 - 408
  • [3] FSCAM: CAM-Based Feature Selection for Clustering scRNA-seq
    Yan Wang
    Jie Gao
    Chenxu Xuan
    Tianhao Guan
    Yujie Wang
    Gang Zhou
    Tao Ding
    Interdisciplinary Sciences: Computational Life Sciences, 2022, 14 : 394 - 408
  • [4] pyNVR: investigating factors affecting feature selection from scRNA-seq data for lineage reconstruction
    Chen, Bob
    Herring, Charles A.
    Lau, Ken S.
    BIOINFORMATICS, 2019, 35 (13) : 2335 - 2337
  • [5] scSFCL:Deep clustering of scRNA-seq data with subspace feature confidence learning
    Meng, Xiaokun
    Zhang, Yuanyuan
    Xu, Xiaoyu
    Zhang, Kaihao
    Feng, Baoming
    COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2025, 114
  • [6] Boosting scRNA-seq data clustering by cluster-aware feature weighting
    Rui-Yi Li
    Jihong Guan
    Shuigeng Zhou
    BMC Bioinformatics, 22
  • [7] FSPAM: A Feature Construction Method to Identifying Cell Populations in ScRNA-seq Data
    Einipour, Amin
    Mosleh, Mohammad
    Ansari-Asl, Karim
    CMES-COMPUTER MODELING IN ENGINEERING & SCIENCES, 2020, 122 (01): : 377 - 397
  • [8] Boosting scRNA-seq data clustering by cluster-aware feature weighting
    Li, Rui-Yi
    Guan, Jihong
    Zhou, Shuigeng
    BMC BIOINFORMATICS, 2021, 22 (SUPPL 6)
  • [9] Computational approaches for interpreting scRNA-seq data
    Rostom, Raghd
    Svensson, Valentine
    Teichmann, Sarah A.
    Kar, Gozde
    FEBS LETTERS, 2017, 591 (15) : 2213 - 2225
  • [10] Cerebro: interactive visualization of scRNA-seq data
    Hillje, Roman
    Pelicci, Pier Giuseppe
    Luzi, Lucilla
    BIOINFORMATICS, 2020, 36 (07) : 2311 - 2313