Comparative Analysis of Supervised Cell Type Detection in Single-Cell RNA-seq Data

被引:1
|
作者
Vasighizaker, Akram [1 ]
Hora, Sheena [1 ]
Trivedi, Yash [1 ]
Rueda, Luis [1 ]
机构
[1] Univ Windsor, Sch Comp Sci, Windsor, ON, Canada
关键词
Cell type identification; scRNA-seq data analysis; Marker gene identification; Feature selection; Classification;
D O I
10.1007/978-3-031-07802-6_28
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Recent studies on Single-cell RNA sequencing (scRNA-seq) technology have been widely applied in biological research and drug discovery. Before in-depth investigations of the functionality of single cells for pathological goals, identification of cell types is an essential step. Recently, several unsupervised learning methods have been developed to identify cell types. However, annotating clusters with the correct cell types require considerable efforts using marker genes. Due to the lack of enough annotated datasets, supervised techniques have not been commonly used in scRNA-seq studies. On the other hand, classification methods use feature selection algorithms to improve the prediction accuracy by finding the most informative features among many in high-dimensional datasets. Hence, to automating the process of annotation of clusters of cell types, we can take advantage of classification models. This article evaluated the performance of three state-of-the-art supervised classification methods, namely support vector machine, k-nearest neighbor, and random forest combined with three feature selection methods, namely Chi-squared, information gain, and ANOVA F-value. The results of applying nine combinations of these methods on three standard scRNA-seq datasets show that support vector machine combined with information gain outperforms other combinations of techniques. Moreover, we investigated reference gene sets and found 11 out of 20 highly variable genes in two different Pancreas gene sets to validate our findings. This article sheds some light on the potential use of identifying marker genes to improve the automatic identification of cell types.
引用
收藏
页码:333 / 345
页数:13
相关论文
共 50 条
  • [31] Single-Cell RNA-Seq Technologies and Related Computational Data Analysis
    Chen, Geng
    Ning, Baitang
    Shi, Tieliu
    [J]. FRONTIERS IN GENETICS, 2019, 10
  • [32] Practical bioinformatics pipelines for single-cell RNA-seq data analysis
    Jiangping He
    Lihui Lin
    Jiekai Chen
    [J]. Biophysics Reports, 2022, 8 (03) : 158 - 169
  • [33] ascend: R package for analysis of single-cell RNA-seq data
    Senabouth, Anne
    Lukowski, Samuel W.
    Hernandez, Jose Alquicira
    Andersen, Stacey B.
    Mei, Xin
    Nguyen, Quan H.
    Powell, Joseph E.
    [J]. GIGASCIENCE, 2019, 8 (08):
  • [34] Tumor genetic analysis from single-cell RNA-seq data
    Tal Nawy
    [J]. Nature Methods, 2018, 15 : 571 - 571
  • [35] The contribution of cell cycle to heterogeneity in single-cell RNA-seq data
    McDavid, Andrew
    Finak, Greg
    Gottardo, Raphael
    [J]. NATURE BIOTECHNOLOGY, 2016, 34 (06) : 591 - 593
  • [36] scTyper: a comprehensive pipeline for the cell typing analysis of single-cell RNA-seq data
    Choi, Ji-Hye
    In Kim, Hye
    Woo, Hyun Goo
    [J]. BMC BIOINFORMATICS, 2020, 21 (01)
  • [37] The contribution of cell cycle to heterogeneity in single-cell RNA-seq data
    Andrew McDavid
    Greg Finak
    Raphael Gottardo
    [J]. Nature Biotechnology, 2016, 34 : 591 - 593
  • [38] scTyper: a comprehensive pipeline for the cell typing analysis of single-cell RNA-seq data
    Ji-Hye Choi
    Hye In Kim
    Hyun Goo Woo
    [J]. BMC Bioinformatics, 21
  • [39] Practical Compass of Single-Cell RNA-Seq Analysis
    Okada, Hiroyuki
    Chung, Ung-il
    Hojo, Hironori
    [J]. CURRENT OSTEOPOROSIS REPORTS, 2023,
  • [40] Embracing the dropouts in single-cell RNA-seq analysis
    Peng Qiu
    [J]. Nature Communications, 11