JOINT for large-scale single-cell RNA-sequencing analysis via soft-clustering and parallel computing

被引:3
|
作者
Cui, Tao [1 ]
Wang, Tingting [1 ,2 ]
机构
[1] Georgetown Univ, Med Ctr, Dept Pharmacol & Physiol, Washington, DC 20057 USA
[2] Georgetown Univ, Med Ctr, Interdisciplinary Program Neurosci, Washington, DC 20057 USA
关键词
RNA-Seq; Single-cell; Dropout; JOINT; Deep learning; Probability; Soft-clustering; DEG; Parallel computing;
D O I
10.1186/s12864-020-07302-6
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Single-cell RNA-Sequencing (scRNA-Seq) has provided single-cell level insights into complex biological processes. However, the high frequency of gene expression detection failures in scRNA-Seq data make it challenging to achieve reliable identification of cell-types and Differentially Expressed Genes (DEG). Moreover, with the explosive growth of single-cell data using 10x genomics protocol, existing methods will soon reach the computation limit due to scalability issues. The single-cell transcriptomics field desperately need new tools and framework to facilitate large-scale single-cell analysis. Results: In order to improve the accuracy, robustness, and speed of scRNA-Seq data processing, we propose a generalized zero-inflated negative binomial mixture model, "JOINT," that can perform probability-based cell-type discovery and DEG analysis simultaneously without the need for imputation. JOINT performs soft-clustering for cell-type identification by computing the probability of individual cells, i.e. each cell can belong to multiple cell types with different probabilities. This is drastically different from existing hard-clustering methods where each cell can only belong to one cell type. The soft-clustering component of the algorithm significantly facilitates the accuracy and robustness of single-cell analysis, especially when the scRNA-Seq datasets are noisy and contain a large number of dropout events. Moreover, JOINT is able to determine the optimal number of cell-types automatically rather than specifying it empirically. The proposed model is an unsupervised learning problem which is solved by using the Expectation and Maximization (EM) algorithm. The EM algorithm is implemented using the TensorFlow deep learning framework, dramatically accelerating the speed for data analysis through parallel GPU computing. Conclusions: Taken together, the JOINT algorithm is accurate and efficient for large-scale scRNA-Seq data analysis via parallel computing. The Python package that we have developed can be readily applied to aid future advances in parallel computing-based single-cell algorithms and research in various biological and biomedical fields.
引用
收藏
页数:16
相关论文
共 50 条
  • [41] A Data-Driven Clustering Recommendation Method for Single-Cell RNA-Sequencing Data
    Tian, Yu
    Zheng, Ruiqing
    Liang, Zhenlan
    Li, Suning
    Wu, Fang-Xiang
    Li, Min
    TSINGHUA SCIENCE AND TECHNOLOGY, 2021, 26 (05) : 772 - 789
  • [42] Single-cell RNA-sequencing analysis of early sea star development
    Foster, Stephany
    Oulhen, Nathalie
    Fresques, Tara
    Zaki, Hossam
    Wessel, Gary
    DEVELOPMENT, 2022, 149 (22):
  • [43] A Data-Driven Clustering Recommendation Method for Single-Cell RNA-Sequencing Data
    Yu Tian
    Ruiqing Zheng
    Zhenlan Liang
    Suning Li
    Fang-Xiang Wu
    Min Li
    TsinghuaScienceandTechnology, 2021, 26 (05) : 772 - 789
  • [44] Single-Cell RNA-Sequencing: Assessment of Differential Expression Analysis Methods
    Dal Molin, Alessandra
    Baruzzo, Giacomo
    Di Camillo, Barbara
    FRONTIERS IN GENETICS, 2017, 8
  • [45] Quantitative assessment of single-cell RNA-sequencing methods
    Angela R Wu
    Norma F Neff
    Tomer Kalisky
    Piero Dalerba
    Barbara Treutlein
    Michael E Rothenberg
    Francis M Mburu
    Gary L Mantalas
    Sopheak Sim
    Michael F Clarke
    Stephen R Quake
    Nature Methods, 2014, 11 : 41 - 46
  • [46] CHARACTERIZING THE GBM CELLULAR LANDSCAPE BY LARGE-SCALE SINGLE-NUCLEUS RNA-SEQUENCING
    Spitzer, Avishay
    Nomura, Masashi
    Garofano, Luciano
    Johnson, Kevin
    Nehar-Belaid, Djamel
    Oh, Young Taek
    Anderson, Kevin J.
    Najac, Ryan D.
    Bussema, Lillian
    Varn, Frederick
    D'Angelo, Fulvio
    Chowdhury, Tamrin
    Migliozzi, Simona
    Park, Jong Bae
    Ermini, Luca
    Golebiewska, Anna
    Niclou, Simone
    Das, Sunit
    Paek, Sun Ha
    Moon, Hyo-Eun
    Mathon, Bertrand
    Di Stefano, Anna-Luisa
    Bielle, Franck
    Laurenge, Alice
    Sanson, Marc
    Tanaka, Shota
    Saito, Nobuhito
    Keir, Steve
    Ashley, David
    Huse, Jason
    Yung, W. K. Alfred
    Lasorella, Anna
    Verhaak, Roel
    Iavarone, Antonio
    Tirosh, Itay
    Suva, Mario
    NEURO-ONCOLOGY, 2023, 25
  • [47] Benchmarking clustering algorithms on estimating the number of cell types from single-cell RNA-sequencing data
    Lijia Yu
    Yue Cao
    Jean Y. H. Yang
    Pengyi Yang
    Genome Biology, 23
  • [48] Benchmarking clustering algorithms on estimating the number of cell types from single-cell RNA-sequencing data
    Yu, Lijia
    Cao, Yue
    Yang, Jean Y. H.
    Yang, Pengyi
    GENOME BIOLOGY, 2022, 23 (01)
  • [49] Large-scale RNA-sequencing of Schizophrenia Brains by the CommonMind Consortium
    Sklar, Pamela
    NEUROPSYCHOPHARMACOLOGY, 2013, 38 : S492 - S492
  • [50] One-step spectral clustering of weighted variables on single-cell RNA-sequencing data
    Park, Min Young
    Park, Seyoung
    KOREAN JOURNAL OF APPLIED STATISTICS, 2020, 33 (04) : 511 - 526