Comparison of Gene Selection Methods for Clustering Single-cell RNA-seq Data

被引:0
|
作者
Zhu, Xiaoshu [1 ]
Wang, Jianxin [2 ]
Li, Rongruan [3 ]
Peng, Xiaoqing [4 ,5 ]
机构
[1] Yulin Normal Univ, Sch Comp Sci & Engn, Yulin 537000, Peoples R China
[2] Cent South Univ, Sch Comp Sci & Engn, Changsha 410083, Peoples R China
[3] Guangxi Normal Univ, Sch Comp Sci & Engn, Sch Software, Guilin 541004, Peoples R China
[4] Cent South Univ, Ctr Med Genet, Sch Life Sci, Changsha 400083, Peoples R China
[5] Cent South Univ, Sch Life Sci, Hunan Key Lab Med Genet, Changsha 410083, Peoples R China
基金
中国国家自然科学基金;
关键词
Single-cell RNA-seq data; data preprocessing; gene selection; cluster; cell type identification; clustering methods; VARIABLE SELECTION; EXPRESSION; REVEALS; IDENTIFICATION; TRANSCRIPTOMICS; RECONSTRUCTION; HETEROGENEITY; DYNAMICS; FATE; BIAS;
D O I
10.2174/1574893618666221103114320
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background In single-cell RNA-seq data, clustering methods are employed to identify cell types to understand cell-differentiation and development. Because clustering methods are sensitive to the high dimensionality of single-cell RNA-seq data, one effective solution is to select a subset of genes in order to reduce the dimensionality. Numerous methods, with different underlying assumptions, have been proposed for choosing a subset of genes to be used for clustering. Objective To guide users in selecting suitable gene selection methods, we give an overview of different gene selection methods and compare their performance in terms of the differences between the selected gene sets, clustering performance, running time, and stability. Results We first review the data preprocessing strategies and gene selection methods in analyzing single-cell RNA-seq data. Then, the overlaps among the gene sets selected by different methods are analyzed and the clustering performance based on different feature gene sets is compared. The analysis reveals that the gene sets selected by the methods based on highly variable genes and high mean genes are most similar, and the highly variable genes play an important role in clustering. Additionally, a small number of selected genes would compromise the clustering performance, such as SCMarker selected fewer genes than other methods, leading to a poorer clustering performance than M3Drop. Conclusion Different gene selection methods perform differently in different scenarios. HVG works well on the full-transcript sequencing datasets, NBDrop and HMG perform better on the 3' end sequencing datasets, M3Drop and HMG are more suitable for big datasets, and SCMarker is most consistent in different preprocessing methods.
引用
收藏
页码:1 / 11
页数:11
相关论文
共 50 条
  • [1] A Comparison for Dimensionality Reduction Methods of Single-Cell RNA-seq Data
    Xiang, Ruizhi
    Wang, Wencan
    Yang, Lei
    Wang, Shiyuan
    Xu, Chaohan
    Chen, Xiaowen
    [J]. FRONTIERS IN GENETICS, 2021, 12
  • [2] Single-cell RNA-seq data clustering: A survey with performance comparison study
    Li, Ruiyi
    Guan, Jihong
    Zhou, Shuigeng
    [J]. JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2020, 18 (04)
  • [3] Analysis of Single-Cell RNA-seq Data by Clustering Approaches
    Zhu, Xiaoshu
    Li, Hong-Dong
    Guo, Lilu
    Wu, Fang-Xiang
    Wang, Jianxin
    [J]. CURRENT BIOINFORMATICS, 2019, 14 (04) : 314 - 322
  • [4] Comparison of transformations for single-cell RNA-seq data
    Constantin Ahlmann-Eltze
    Wolfgang Huber
    [J]. Nature Methods, 2023, 20 : 665 - 672
  • [5] Deep Learning for Clustering Single-cell RNA-seq Data
    Zhu, Yuan
    Bai, Litai
    Ning, Zilin
    Fu, Wenfei
    Liu, Jie
    Jiang, Linfeng
    Fei, Shihuang
    Gong, Shiyun
    Lu, Lulu
    Deng, Minghua
    Yi, Ming
    [J]. CURRENT BIOINFORMATICS, 2024, 19 (03) : 193 - 210
  • [6] Comparison of transformations for single-cell RNA-seq data
    Ahlmann-Eltze, Constantin
    Huber, Wolfgang
    [J]. NATURE METHODS, 2023, 20 (05) : 665 - +
  • [7] scFseCluster: a feature selection-enhanced clustering for single-cell RNA-seq data
    Wang, Zongqin
    Xie, Xiaojun
    Liu, Shouyang
    Ji, Zhiwei
    [J]. LIFE SCIENCE ALLIANCE, 2023, 6 (12)
  • [8] FEATS: feature selection-based clustering of single-cell RNA-seq data
    Vans, Edwin
    Patil, Ashwini
    Sharma, Alok
    [J]. BRIEFINGS IN BIOINFORMATICS, 2021, 22 (04)
  • [9] Accurate feature selection improves single-cell RNA-seq cell clustering
    Su, Kenong
    Yu, Tianwei
    Wu, Hao
    [J]. BRIEFINGS IN BIOINFORMATICS, 2021, 22 (05)
  • [10] Clustering of Small-Sample Single-Cell RNA-Seq Data via Feature Clustering and Selection
    Vans, Edwin
    Sharma, Alok
    Patil, Ashwini
    Shigemizu, Daichi
    Tsunoda, Tatsuhiko
    [J]. PRICAI 2019: TRENDS IN ARTIFICIAL INTELLIGENCE, PT III, 2019, 11672 : 445 - 456