Clustering of cells based on gene expression is one of the major steps in single-cell RNA-sequencing (scRNA-seq) data analysis. One key challenge in cluster analysis is the unknown number of clusters and, for this issue, there is still no comprehensive solution. To enhance the process of defining meaningful cluster resolution, we compare Bayesian latent Dirichlet allocation (LDA) method to its non-parametric counterpart, hierarchical Dirichlet process (HDP) in the context of clustering scRNA-seq data. A potential main advantage of HDP is that it does not require the number of clusters as an input parameter from the user. While LDA has been used in single-cell data analysis, it has not been compared in detail with HDP. Here, we compare the cell clustering performance of LDA and HDP using four scRNA-seq datasets (immune cells, kidney, pancreas and decidua/placenta), with a specific focus on cluster numbers. Using both intrinsic (DB-index) and extrinsic (ARI) cluster quality measures, we show that the performance of LDA and HDP is dataset dependent. We describe a case where HDP produced a more appropriate clustering compared to the best performer from a series of LDA clusterings with different numbers of clusters. However, we also observed cases where the best performing LDA cluster numbers appropriately capture the main biological features while HDP tended to inflate the number of clusters. Overall, our study highlights the importance of carefully assessing the number of clusters when analyzing scRNA-seq data.
机构:
Cent South Univ, Sch Comp Sci & Engn, 932 South Lushan Rd, Changsha 410083, Peoples R ChinaCent South Univ, Sch Comp Sci & Engn, 932 South Lushan Rd, Changsha 410083, Peoples R China
Fang, Zhaoyu
Zheng, Ruiqing
论文数: 0引用数: 0
h-index: 0
机构:
Cent South Univ, Sch Comp Sci & Engn, 932 South Lushan Rd, Changsha 410083, Peoples R ChinaCent South Univ, Sch Comp Sci & Engn, 932 South Lushan Rd, Changsha 410083, Peoples R China
Zheng, Ruiqing
Li, Min
论文数: 0引用数: 0
h-index: 0
机构:
Cent South Univ, Sch Comp Sci & Engn, 932 South Lushan Rd, Changsha 410083, Peoples R ChinaCent South Univ, Sch Comp Sci & Engn, 932 South Lushan Rd, Changsha 410083, Peoples R China
机构:
Univ N Carolina, Dept Genet, Chapel Hill, NC 27599 USAUniv N Carolina, Dept Genet, Chapel Hill, NC 27599 USA
Yang, Yuchen
Huh, Ruth
论文数: 0引用数: 0
h-index: 0
机构:
Univ N Carolina, Dept Biostat, Chapel Hill, NC 27599 USAUniv N Carolina, Dept Genet, Chapel Hill, NC 27599 USA
Huh, Ruth
Culpepper, Houston W.
论文数: 0引用数: 0
h-index: 0
机构:
Univ N Carolina, Dept Genet, Chapel Hill, NC 27599 USAUniv N Carolina, Dept Genet, Chapel Hill, NC 27599 USA
Culpepper, Houston W.
Lin, Yuan
论文数: 0引用数: 0
h-index: 0
机构:
Peking Univ, Sch Life Sci, Ctr Bioinformat, Beijing 100871, Peoples R ChinaUniv N Carolina, Dept Genet, Chapel Hill, NC 27599 USA
Lin, Yuan
Love, Michael I.
论文数: 0引用数: 0
h-index: 0
机构:
Univ N Carolina, Dept Genet, Chapel Hill, NC 27599 USA
Univ N Carolina, Dept Biostat, Chapel Hill, NC 27599 USAUniv N Carolina, Dept Genet, Chapel Hill, NC 27599 USA
Love, Michael I.
Li, Yun
论文数: 0引用数: 0
h-index: 0
机构:
Univ N Carolina, Dept Genet, Chapel Hill, NC 27599 USA
Univ N Carolina, Dept Biostat, Chapel Hill, NC 27599 USAUniv N Carolina, Dept Genet, Chapel Hill, NC 27599 USA