Secuer: Ultrafast, scalable and accurate clustering of single-cell RNA-seq data

被引:5
|
作者
Wei, Nana [1 ]
Nie, Yating [1 ]
Liu, Lin [2 ]
Zheng, Xiaoqi [3 ,4 ]
Wu, Hua-Jun [5 ,6 ]
机构
[1] Shanghai Normal Univ, Dept Math, Shanghai, Peoples R China
[2] Shanghai Jiao Tong Univ, SJTU Yale Joint Ctr Biostat & Data Sci, CMA Shanghai, Inst Nat Sci,MOE LSC,Sch Math Sci, Shanghai, Peoples R China
[3] Shanghai Artificial Intelligence Lab, Shanghai, Peoples R China
[4] Shanghai Jiao Tong Univ, Ctr Single Cell Omics, Sch Publ Hlth, Sch Med, Shanghai, Peoples R China
[5] Peking Univ Hlth Sci Ctr, Ctr Precis Med Multiom Res, Sch Basic Med Sci, Beijing, Peoples R China
[6] Peking Univ Canc Hosp & Inst, Beijing, Peoples R China
基金
上海市自然科学基金; 中国国家自然科学基金; 国家重点研发计划;
关键词
HETEROGENEITY; TRANSCRIPTOMES; FATE;
D O I
10.1371/journal.pcbi.1010753
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Identifying cell clusters is a critical step for single-cell transcriptomics study. Despite the numerous clustering tools developed recently, the rapid growth of scRNA-seq volumes prompts for a more (computationally) efficient clustering method. Here, we introduce Secuer, a Scalable and Efficient speCtral clUstERing algorithm for scRNA-seq data. By employing an anchor-based bipartite graph representation algorithm, Secuer enjoys reduced runtime and memory usage over one order of magnitude for datasets with more than 1 million cells. Meanwhile, Secuer also achieves better or comparable accuracy than competing methods in small and moderate benchmark datasets. Furthermore, we showcase that Secuer can also serve as a building block for a new consensus clustering method, Secuer-consensus, which again improves the runtime and scalability of state-of-the-art consensus clustering methods while also maintaining the accuracy. Overall, Secuer is a versatile, accurate, and scalable clustering framework suitable for small to ultra-large single-cell clustering tasks. Author summary Recently, single-cell RNA sequencing (scRNA-seq) has enabled profiling of thousands to millions of cells, spurring the development of efficient clustering algorithms for large or ultra-large datasets. In this work, we developed an ultrafast clustering method, Secuer, for small to ultra-large scRNA-seq data. Using simulation and real datasets, we demonstrated that Secuer yields high accuracy, while saving runtime and memory usage by orders of magnitude, and that it can be efficiently scaled up to ultra-large datasets. Additionally, with Secuer as a subroutine, we proposed Secuer-consensus, a consensus clustering algorithm. Our results show that Secuer-consensus performs better in terms of clustering accuracy and runtime.
引用
收藏
页数:20
相关论文
共 50 条
  • [41] Fast and accurate single-cell RNA-seq analysis by clustering of transcript-compatibility counts
    Ntranos, Vasilis
    Kamath, Govinda M.
    Zhang, Jesse M.
    Pachter, Lior
    Tse, David N.
    GENOME BIOLOGY, 2016, 17
  • [42] Comparison of transformations for single-cell RNA-seq data
    Constantin Ahlmann-Eltze
    Wolfgang Huber
    Nature Methods, 2023, 20 : 665 - 672
  • [43] An Efficient and Flexible Method for Deconvoluting Bulk RNA-Seq Data with Single-Cell RNA-Seq Data
    Sun, Xifang
    Sun, Shiquan
    Yang, Sheng
    CELLS, 2019, 8 (10)
  • [44] Comparison of transformations for single-cell RNA-seq data
    Ahlmann-Eltze, Constantin
    Huber, Wolfgang
    NATURE METHODS, 2023, 20 (05) : 665 - +
  • [45] CMF-Impute: an accurate imputation tool for single-cell RNA-seq data
    Xu, Junlin
    Cai, Lijun
    Liao, Bo
    Zhu, Wen
    Yang, JiaLiang
    BIOINFORMATICS, 2020, 36 (10) : 3139 - 3147
  • [46] scSemiAAE: a semi-supervised clustering model for single-cell RNA-seq data
    Zile Wang
    Haiyun Wang
    Jianping Zhao
    Chunhou Zheng
    BMC Bioinformatics, 24
  • [47] Impact of data preprocessing on cell-type clustering based on single-cell RNA-seq data
    Wang, Chunxiang
    Gao, Xin
    Liu, Juntao
    BMC BIOINFORMATICS, 2020, 21 (01)
  • [48] Impact of data preprocessing on cell-type clustering based on single-cell RNA-seq data
    Chunxiang Wang
    Xin Gao
    Juntao Liu
    BMC Bioinformatics, 21
  • [49] FlowGrid enables fast clustering of very large single-cell RNA-seq data
    Fang, Xiunan
    Ho, Joshua W. K.
    BIOINFORMATICS, 2022, 38 (01) : 282 - 283
  • [50] scFseCluster: a feature selection-enhanced clustering for single-cell RNA-seq data
    Wang, Zongqin
    Xie, Xiaojun
    Liu, Shouyang
    Ji, Zhiwei
    LIFE SCIENCE ALLIANCE, 2023, 6 (12)