MISC: missing imputation for single-cell RNA sequencing data

被引:12
|
作者
Yang, Mary Qu [1 ]
Weissman, Sherman M. [2 ]
Yang, William [2 ,3 ]
Zhang, Jialing [2 ]
Canaann, Allon [2 ]
Guan, Renchu [1 ,4 ]
机构
[1] Univ Arkansas Little Rock George Washington Donag, Joint Bioinformat Program, Little Rock, AR 72204 USA
[2] Yale Univ, Dept Genet, New Haven, CT 06512 USA
[3] Carnegie Mellon Univ, Sch Comp Sci, Dept Comp Sci, Pittsburgh, PA 15213 USA
[4] Jilin Univ, Coll Comp Sci & Technol, Changchun 130012, Jilin, Peoples R China
来源
BMC SYSTEMS BIOLOGY | 2018年 / 12卷
关键词
Missing data; Single-cell RNA-seq; False negative curve; Zero-inflated model;
D O I
10.1186/s12918-018-0638-y
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
BackgroundSingle-cell RNA sequencing (scRNA-seq) technology provides an effective way to study cell heterogeneity. However, due to the low capture efficiency and stochastic gene expression, scRNA-seq data often contains a high percentage of missing values. It has been showed that the missing rate can reach approximately 30% even after noise reduction. To accurately recover missing values in scRNA-seq data, we need to know where the missing data is; how much data is missing; and what are the values of these data.MethodsTo solve these three problems, we propose a novel model with a hybrid machine learning method, namely, missing imputation for single-cell RNA-seq (MISC). To solve the first problem, we transformed it to a binary classification problem on the RNA-seq expression matrix. Then, for the second problem, we searched for the intersection of the classification results, zero-inflated model and false negative model results. Finally, we used the regression model to recover the data in the missing elements.ResultsWe compared the raw data without imputation, the mean-smooth neighbor cell trajectory, MISC on chronic myeloid leukemia data (CML), the primary somatosensory cortex and the hippocampal CA1 region of mouse brain cells. On the CML data, MISC discovered a trajectory branch from the CP-CML to the BC-CML, which provides direct evidence of evolution from CP to BC stem cells. On the mouse brain data, MISC clearly divides the pyramidal CA1 into different branches, and it is direct evidence of pyramidal CA1 in the subpopulations. In the meantime, with MISC, the oligodendrocyte cells became an independent group with an apparent boundary.ConclusionsOur results showed that the MISC model improved the cell type classification and could be instrumental to study cellular heterogeneity. Overall, MISC is a robust missing data imputation model for single-cell RNA-seq data.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] MISC: missing imputation for single-cell RNA sequencing data (vol 12, 114, 2018)
    Yang, Mary Qu
    Weissman, Sherman M.
    Yang, William
    Zhang, Jialing
    Canaan, Allon
    Guan, Renchu
    [J]. BMC SYSTEMS BIOLOGY, 2019, 13
  • [2] Dropout imputation and batch effect correction for single-cell RNA sequencing data
    Li Gang
    Yang Yuchen
    Van Buren Eric
    Li Yun
    Department of Statistics and Operations Research
    Department of Genetics
    Department of Biostatistics
    Department of Computer Science
    [J]. 生物组学研究杂志(英文), 2019, 02 (04) : 169 - 177
  • [3] Regulatory network-based imputation of dropouts in single-cell RNA sequencing data
    Leote, Ana Carolina
    Wu, Xiaohui
    Beyer, Andreas
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2022, 18 (02)
  • [4] Evaluating the performance of dropout imputation and clustering methods for single-cell RNA sequencing data
    Xu, Junlin
    Cui, Lingyu
    Zhuang, Jujuan
    Meng, Yajie
    Bing, Pingping
    He, Binsheng
    Tian, Geng
    Pui, Choi Kwok
    Wu, Taoyang
    Wang, Bing
    Yang, Jialiang
    [J]. COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 146
  • [5] Missing data and technical variability in single-cell RNA-sequencing experiments
    Hicks, Stephanie C.
    Townes, F. William
    Teng, Mingxiang
    Irizarry, Rafael A.
    [J]. BIOSTATISTICS, 2018, 19 (04) : 562 - 578
  • [6] Single-cell RNA sequencing data imputation using bi-level feature propagation
    Lee, Junseok
    Yun, Sukwon
    Kim, Yeongmin
    Chen, Tianlong
    Kellis, Manolis
    Park, Chanyoung
    [J]. BRIEFINGS IN BIOINFORMATICS, 2024, 25 (03)
  • [7] A systematic evaluation of single-cell RNA-sequencing imputation methods
    Hou, Wenpin
    Ji, Zhicheng
    Ji, Hongkai
    Hicks, Stephanie C.
    [J]. GENOME BIOLOGY, 2020, 21 (01)
  • [8] A systematic evaluation of single-cell RNA-sequencing imputation methods
    Wenpin Hou
    Zhicheng Ji
    Hongkai Ji
    Stephanie C. Hicks
    [J]. Genome Biology, 21
  • [9] Recovery of missing single-cell RNA-sequencing data with optimized transcriptomic references
    Pool, Allan-Hermann
    Poldsam, Helen
    Chen, Sisi
    Thomson, Matt
    Oka, Yuki
    [J]. NATURE METHODS, 2023, 20 (10) : 1506 - +
  • [10] Recovery of missing single-cell RNA-sequencing data with optimized transcriptomic references
    Allan-Hermann Pool
    Helen Poldsam
    Sisi Chen
    Matt Thomson
    Yuki Oka
    [J]. Nature Methods, 2023, 20 : 1506 - 1515