JOINT for large-scale single-cell RNA-sequencing analysis via soft-clustering and parallel computing

被引:3
|
作者
Cui, Tao [1 ]
Wang, Tingting [1 ,2 ]
机构
[1] Georgetown Univ, Med Ctr, Dept Pharmacol & Physiol, Washington, DC 20057 USA
[2] Georgetown Univ, Med Ctr, Interdisciplinary Program Neurosci, Washington, DC 20057 USA
关键词
RNA-Seq; Single-cell; Dropout; JOINT; Deep learning; Probability; Soft-clustering; DEG; Parallel computing;
D O I
10.1186/s12864-020-07302-6
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Single-cell RNA-Sequencing (scRNA-Seq) has provided single-cell level insights into complex biological processes. However, the high frequency of gene expression detection failures in scRNA-Seq data make it challenging to achieve reliable identification of cell-types and Differentially Expressed Genes (DEG). Moreover, with the explosive growth of single-cell data using 10x genomics protocol, existing methods will soon reach the computation limit due to scalability issues. The single-cell transcriptomics field desperately need new tools and framework to facilitate large-scale single-cell analysis. Results: In order to improve the accuracy, robustness, and speed of scRNA-Seq data processing, we propose a generalized zero-inflated negative binomial mixture model, "JOINT," that can perform probability-based cell-type discovery and DEG analysis simultaneously without the need for imputation. JOINT performs soft-clustering for cell-type identification by computing the probability of individual cells, i.e. each cell can belong to multiple cell types with different probabilities. This is drastically different from existing hard-clustering methods where each cell can only belong to one cell type. The soft-clustering component of the algorithm significantly facilitates the accuracy and robustness of single-cell analysis, especially when the scRNA-Seq datasets are noisy and contain a large number of dropout events. Moreover, JOINT is able to determine the optimal number of cell-types automatically rather than specifying it empirically. The proposed model is an unsupervised learning problem which is solved by using the Expectation and Maximization (EM) algorithm. The EM algorithm is implemented using the TensorFlow deep learning framework, dramatically accelerating the speed for data analysis through parallel GPU computing. Conclusions: Taken together, the JOINT algorithm is accurate and efficient for large-scale scRNA-Seq data analysis via parallel computing. The Python package that we have developed can be readily applied to aid future advances in parallel computing-based single-cell algorithms and research in various biological and biomedical fields.
引用
收藏
页数:16
相关论文
共 50 条
  • [31] Single-cell isolation by a modular single-cell pipette for RNA-sequencing
    Zhang, Kai
    Gao, Min
    Chong, Zechen
    Li, Ying
    Han, Xin
    Chen, Rui
    Qin, Lidong
    LAB ON A CHIP, 2016, 16 (24) : 4742 - 4748
  • [32] New Insights into Juxtaglomerular Cells via Single-Cell RNA-Sequencing
    Wei, Jin
    Zhang, Jie
    Chan, Jenna
    Thalakola, Anish
    Patel, Kshama
    Yadav, Nikita
    Wang, Lei
    Buggs, Jacentha
    Liu, Ruisheng
    FASEB JOURNAL, 2022, 36
  • [33] Immunology Driven by Large-Scale Single-Cell Sequencing
    Gomes, Tomas
    Teichmann, Sarah A.
    Talavera-Lopez, Carlos
    TRENDS IN IMMUNOLOGY, 2019, 40 (11) : 1011 - 1021
  • [34] A high-efficiency differential expression method for cancer heterogeneity using large-scale single-cell RNA-sequencing data
    Yuan, Xin
    Ma, Shuangge
    Fa, Botao
    Wei, Ting
    Ma, Yanran
    Wang, Yifan
    Lv, Wenwen
    Zhang, Yue
    Zheng, Junke
    Chen, Guoqiang
    Sun, Jing
    Yu, Zhangsheng
    FRONTIERS IN GENETICS, 2022, 13
  • [35] Vacuum-Driven Micropump with Support Columns: Toward Large Scale Single-cell RNA-sequencing
    Hisa, Kento
    Kakugawa, Musashi
    Shibata, Takayuki
    Nagai, Moeto
    2018 INTERNATIONAL CONFERENCE ON MANIPULATION, AUTOMATION AND ROBOTICS AT SMALL SCALES (MARSS), 2018,
  • [36] Unbiased classification of sensory neuron types by large-scale single-cell RNA sequencing
    Dmitry Usoskin
    Alessandro Furlan
    Saiful Islam
    Hind Abdo
    Peter Lönnerberg
    Daohua Lou
    Jens Hjerling-Leffler
    Jesper Haeggström
    Olga Kharchenko
    Peter V Kharchenko
    Sten Linnarsson
    Patrik Ernfors
    Nature Neuroscience, 2015, 18 : 145 - 153
  • [37] Unbiased classification of sensory neuron types by large-scale single-cell RNA sequencing
    Usoskin, Dmitry
    Furlan, Alessandro
    Islam, Saiful
    Abdo, Hind
    Lonnerberg, Peter
    Lou, Daohua
    Hjerling-Leffler, Jens
    Haeggstrom, Jesper
    Kharchenko, Olga
    Kharchenko, Peter V.
    Linnarsson, Sten
    Ernfors, Patrik
    NATURE NEUROSCIENCE, 2015, 18 (01) : 145 - +
  • [38] scDEA: differential expression analysis in single-cell RNA-sequencing data via ensemble learning
    Li, Hui-Sheng
    Le Ou-Yang
    Yuan Zhu
    Hong Yan
    Zhang, Xiao-Fei
    BRIEFINGS IN BIOINFORMATICS, 2022, 23 (01)
  • [39] Single-Cell RNA-Sequencing Analysis Provides Insights into IgA Nephropathy
    Xia, Ming
    Li, Yifu
    Liu, Yu
    Dong, Zheng
    Liu, Hong
    BIOMOLECULES, 2025, 15 (02)
  • [40] Spectral Clustering of Single-Cell RNA-Sequencing Data by Multiple Feature Sets Affinity
    Liu, Yang
    Li, Feng
    Shang, Junliang
    Ge, Daohui
    Ren, Qianqian
    Li, Shengjun
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT III, 2023, 14088 : 268 - 278