JOINT for large-scale single-cell RNA-sequencing analysis via soft-clustering and parallel computing

被引:3
|
作者
Cui, Tao [1 ]
Wang, Tingting [1 ,2 ]
机构
[1] Georgetown Univ, Med Ctr, Dept Pharmacol & Physiol, Washington, DC 20057 USA
[2] Georgetown Univ, Med Ctr, Interdisciplinary Program Neurosci, Washington, DC 20057 USA
关键词
RNA-Seq; Single-cell; Dropout; JOINT; Deep learning; Probability; Soft-clustering; DEG; Parallel computing;
D O I
10.1186/s12864-020-07302-6
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Single-cell RNA-Sequencing (scRNA-Seq) has provided single-cell level insights into complex biological processes. However, the high frequency of gene expression detection failures in scRNA-Seq data make it challenging to achieve reliable identification of cell-types and Differentially Expressed Genes (DEG). Moreover, with the explosive growth of single-cell data using 10x genomics protocol, existing methods will soon reach the computation limit due to scalability issues. The single-cell transcriptomics field desperately need new tools and framework to facilitate large-scale single-cell analysis. Results: In order to improve the accuracy, robustness, and speed of scRNA-Seq data processing, we propose a generalized zero-inflated negative binomial mixture model, "JOINT," that can perform probability-based cell-type discovery and DEG analysis simultaneously without the need for imputation. JOINT performs soft-clustering for cell-type identification by computing the probability of individual cells, i.e. each cell can belong to multiple cell types with different probabilities. This is drastically different from existing hard-clustering methods where each cell can only belong to one cell type. The soft-clustering component of the algorithm significantly facilitates the accuracy and robustness of single-cell analysis, especially when the scRNA-Seq datasets are noisy and contain a large number of dropout events. Moreover, JOINT is able to determine the optimal number of cell-types automatically rather than specifying it empirically. The proposed model is an unsupervised learning problem which is solved by using the Expectation and Maximization (EM) algorithm. The EM algorithm is implemented using the TensorFlow deep learning framework, dramatically accelerating the speed for data analysis through parallel GPU computing. Conclusions: Taken together, the JOINT algorithm is accurate and efficient for large-scale scRNA-Seq data analysis via parallel computing. The Python package that we have developed can be readily applied to aid future advances in parallel computing-based single-cell algorithms and research in various biological and biomedical fields.
引用
收藏
页数:16
相关论文
共 50 条
  • [21] A sparse differential clustering algorithm for tracing cell type changes via single-cell RNA-sequencing data
    Barron, Martin
    Zhang, Siyuan
    Li, Jun
    NUCLEIC ACIDS RESEARCH, 2018, 46 (03)
  • [22] Design and computational analysis of single-cell RNA-sequencing experiments
    Bacher, Rhonda
    Kendziorski, Christina
    GENOME BIOLOGY, 2016, 17
  • [23] Design and computational analysis of single-cell RNA-sequencing experiments
    Rhonda Bacher
    Christina Kendziorski
    Genome Biology, 17
  • [24] Clustering single-cell rna-sequencing data based on matching clusters structures
    Wang, Yizhang
    Zhou, You
    Pang, Wie
    Liang, Yanchun
    Wang, Shu
    Tehnicki Vjesnik, 2020, 27 (01): : 89 - 95
  • [25] Statistical methods for analysis of single-cell RNA-sequencing data
    Das, Samarendra
    Rai, Shesh N.
    METHODSX, 2021, 8
  • [26] Single-cell RNA-sequencing in asthma research
    Tang, Weifeng
    Li, Mihui
    Teng, Fangzhou
    Cui, Jie
    Dong, Jingcheng
    Wang, Wenqian
    FRONTIERS IN IMMUNOLOGY, 2022, 13
  • [27] Clustering Single-cell RNA-sequencing Data based on Matching Clusters Structures
    Wang, Yizhang
    Zhou, You
    Pang, Wie
    Liang, Yanchun
    Wang, Shu
    TEHNICKI VJESNIK-TECHNICAL GAZETTE, 2020, 27 (01): : 89 - 95
  • [28] Machine learning and statistical methods for clustering single-cell RNA-sequencing data
    Petegrosso, Raphael
    Li, Zhuliu
    Kuang, Rui
    BRIEFINGS IN BIOINFORMATICS, 2020, 21 (04) : 1209 - 1223
  • [29] A HIERARCHICAL BAYESIAN MODEL FOR SINGLE-CELL CLUSTERING USING RNA-SEQUENCING DATA
    Liu, Yiyi
    Warren, Joshua L.
    Zhao, Hongyu
    ANNALS OF APPLIED STATISTICS, 2019, 13 (03): : 1733 - 1752
  • [30] scGAAC: A graph attention autoencoder for clustering single-cell RNA-sequencing data
    Zhang, Lin
    Xiang, Haiping
    Wang, Feng
    Chen, Zepeng
    Shen, Mo
    Ma, Jiani
    Liu, Hui
    Zheng, Hongdang
    METHODS, 2024, 229 : 115 - 124