Dirichlet process mixture models for single-cell RNA-seq clustering

被引:6
|
作者
Adossa, Nigatu A. [1 ,2 ]
Rytkonen, Kalle T. [1 ,2 ,3 ]
Elo, Laura L. [1 ,2 ,4 ]
机构
[1] Univ Turku, Turku Biosci Ctr, FI-20520 Turku, Finland
[2] Abo Akad Univ, FI-20520 Turku, Finland
[3] Univ Turku, Res Ctr Integrat Physiol & Pharmacol, Inst Biomed, FI-20014 Turku, Finland
[4] Univ Turku, Inst Biomed, FI-20014 Turku, Finland
来源
BIOLOGY OPEN | 2022年 / 11卷 / 04期
基金
芬兰科学院;
关键词
Clustering; Hierarchical Dirichlet process (HDP); Latent Dirichlet allocation (LDA); ScRNA-seq; VARIATIONAL INFERENCE; RECONSTRUCTION;
D O I
10.1242/bio.059001
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Clustering of cells based on gene expression is one of the major steps in single-cell RNA-sequencing (scRNA-seq) data analysis. One key challenge in cluster analysis is the unknown number of clusters and, for this issue, there is still no comprehensive solution. To enhance the process of defining meaningful cluster resolution, we compare Bayesian latent Dirichlet allocation (LDA) method to its non-parametric counterpart, hierarchical Dirichlet process (HDP) in the context of clustering scRNA-seq data. A potential main advantage of HDP is that it does not require the number of clusters as an input parameter from the user. While LDA has been used in single-cell data analysis, it has not been compared in detail with HDP. Here, we compare the cell clustering performance of LDA and HDP using four scRNA-seq datasets (immune cells, kidney, pancreas and decidua/placenta), with a specific focus on cluster numbers. Using both intrinsic (DB-index) and extrinsic (ARI) cluster quality measures, we show that the performance of LDA and HDP is dataset dependent. We describe a case where HDP produced a more appropriate clustering compared to the best performer from a series of LDA clusterings with different numbers of clusters. However, we also observed cases where the best performing LDA cluster numbers appropriately capture the main biological features while HDP tended to inflate the number of clusters. Overall, our study highlights the importance of carefully assessing the number of clusters when analyzing scRNA-seq data.
引用
收藏
页数:9
相关论文
共 50 条
  • [21] Single-Cell RNA-Seq Debiased Clustering via Batch Effect Disentanglement
    Li, Yunfan
    Lin, Yijie
    Hu, Peng
    Peng, Dezhong
    Luo, Han
    Peng, Xi
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (08) : 11371 - 11381
  • [22] Clustering Single-Cell RNA-Seq Data with Regularized Gaussian Graphical Model
    Liu, Zhenqiu
    GENES, 2021, 12 (02) : 1 - 12
  • [23] Correlation Imputation for Single-Cell RNA-seq
    Gan, Luqin
    Vinci, Giuseppe
    Allen, Genevera I.
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2022, 29 (05) : 465 - 482
  • [24] scGAC: a graph attentional architecture for clustering single-cell RNA-seq data
    Cheng, Yi
    Ma, Xiuli
    BIOINFORMATICS, 2022, 38 (08) : 2187 - 2193
  • [25] Clustering and visualization of single-cell RNA-seq data using path metrics
    Manousidaki, Andriana
    Little, Anna
    Xie, Yuying
    PLOS COMPUTATIONAL BIOLOGY, 2024, 20 (05)
  • [26] Single-cell RNA-seq data clustering: A survey with performance comparison study
    Li, Ruiyi
    Guan, Jihong
    Zhou, Shuigeng
    JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2020, 18 (04)
  • [27] Consensus clustering of single-cell RNA-seq data by enhancing network affinity
    Cui, Yaxuan
    Zhang, Shaoqiang
    Liang, Ying
    Wang, Xiangyun
    Ferraro, Thomas N.
    Chen, Yong
    BRIEFINGS IN BIOINFORMATICS, 2021, 22 (06)
  • [28] scDFN: enhancing single-cell RNA-seq clustering with deep fusion networks
    Liu, Tianxiang
    Jia, Cangzhi
    Bi, Yue
    Guo, Xudong
    Zou, Quan
    Li, Fuyi
    BRIEFINGS IN BIOINFORMATICS, 2024, 25 (06)
  • [29] Evaluation of single-cell RNA-seq clustering algorithms on cancer tumor datasets
    Mahalanabis, Alaina
    Turinsky, Andrei L.
    Husić, Mia
    Christensen, Erik
    Luo, Ping
    Naidas, Alaine
    Brudno, Michael
    Pugh, Trevor
    Ramani, Arun K.
    Shooshtari, Parisa
    Computational and Structural Biotechnology Journal, 2022, 20 : 6375 - 6387
  • [30] SC3: consensus clustering of single-cell RNA-seq data
    Kiselev, Vladimir Yu
    Kirschner, Kristina
    Schaub, Michael T.
    Andrews, Tallulah
    Yiu, Andrew
    Chandra, Tamir
    Natarajan, Kedar N.
    Reik, Wolf
    Barahona, Mauricio
    Green, Anthony R.
    Hemberg, Martin
    NATURE METHODS, 2017, 14 (05) : 483 - +