Regulatory network-based imputation of dropouts in single-cell RNA sequencing data

被引:8
|
作者
Leote, Ana Carolina [1 ,2 ,3 ]
Wu, Xiaohui [1 ,4 ,5 ]
Beyer, Andreas [1 ,2 ,3 ,6 ,7 ,8 ]
机构
[1] Cluster Excellence Cellular Stress Responses Agin, Cologne, Germany
[2] Univ Cologne, Fac Med, Cologne, Germany
[3] Cologne Univ Hosp, Cologne, Germany
[4] Xiamen Univ, Dept Automat, Xiamen, Peoples R China
[5] Soochow Univ, Pasteurien Coll, Suzhou, Peoples R China
[6] Ctr Mol Med Cologne CMMC, Cologne, Germany
[7] Univ Cologne, Cologne Sch Computat Biol, Cologne, Germany
[8] Univ Cologne, Ctr Data Sci & Simulat, Cologne, Germany
基金
中国国家自然科学基金;
关键词
GENE-EXPRESSION;
D O I
10.1371/journal.pcbi.1009849
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Single-cell RNA sequencing (scRNA-seq) methods are typically unable to quantify the expression levels of all genes in a cell, creating a need for the computational prediction of missing values ('dropout imputation'). Most existing dropout imputation methods are limited in the sense that they exclusively use the scRNA-seq dataset at hand and do not exploit external gene-gene relationship information. Further, it is unknown if all genes equally benefit from imputation or which imputation method works best for a given gene. Here, we show that a transcriptional regulatory network learned from external, independent gene expression data improves dropout imputation. Using a variety of human scRNA-seq datasets we demonstrate that our network-based approach outperforms published state-of-the-art methods. The network-based approach performs particularly well for lowly expressed genes, including cell-type-specific transcriptional regulators. Further, the cell-to-cell variation of 11.3% to 48.8% of the genes could not be adequately imputed by any of the methods that we tested. In those cases gene expression levels were best predicted by the mean expression across all cells, i.e. assuming no measurable expression variation between cells. These findings suggest that different imputation methods are optimal for different genes. We thus implemented an R-package called ADImpute (available via Bioconductor ) that automatically determines the best imputation method for each gene in a dataset. Our work represents a paradigm shift by demonstrating that there is no single best imputation method. Instead, we propose that imputation should maximally exploit external information and be adapted to gene-specific features, such as expression level and expression variation across cells. Author summarySingle-cell RNA-sequencing (scRNA-seq) allows for gene expression to be quantified in individual cells and thus plays a critical role in revealing differences between cells within tissues and characterizing them in healthy and pathological conditions. Because scRNA-seq captures the RNA content of individual cells, lowly expressed genes, for which few RNA molecules are present in the cell, are easily missed. These events are called 'dropouts' and considerably hinder analysis of the resulting data. In this work, we propose to make use of gene-gene relationships, learnt from external and more complete datasets, to estimate the true expression of genes that could not be quantified in a given cell. We show that this approach generally outperforms previously published methods, but also that different genes are better estimated with different methods. To allow the community to use our proposed method and combine it with existing ones, we created the R package ADImpute, available through Bioconductor.
引用
收藏
页数:25
相关论文
共 50 条
  • [1] Network-Based Single-Cell RNA-Seq Data Imputation Enhances Cell Type Identification
    Zand, Maryam
    Ruan, Jianhua
    [J]. GENES, 2020, 11 (04)
  • [2] MISC: missing imputation for single-cell RNA sequencing data
    Yang, Mary Qu
    Weissman, Sherman M.
    Yang, William
    Zhang, Jialing
    Canaann, Allon
    Guan, Renchu
    [J]. BMC SYSTEMS BIOLOGY, 2018, 12
  • [3] Dropout imputation and batch effect correction for single-cell RNA sequencing data
    Li Gang
    Yang Yuchen
    Van Buren Eric
    Li Yun
    Department of Statistics and Operations Research
    Department of Genetics
    Department of Biostatistics
    Department of Computer Science
    [J]. 生物组学研究杂志(英文), 2019, 02 (04) : 169 - 177
  • [4] CDSImpute: An ensemble similarity imputation method for single-cell RNA sequence dropouts
    Azim, Riasat
    Wang, Shulin
    Dipu, Shoaib Ahmed
    [J]. COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 146
  • [5] NISC: Neural Network-Imputation for Single-Cell RNA Sequencing and Cell Type Clustering
    Zhang, Xiang
    Chen, Zhuo
    Bhadani, Rahul
    Cao, Siyang
    Lu, Meng
    Lytal, Nicholas
    Chen, Yin
    An, Lingling
    [J]. FRONTIERS IN GENETICS, 2022, 13
  • [6] Evaluating the performance of dropout imputation and clustering methods for single-cell RNA sequencing data
    Xu, Junlin
    Cui, Lingyu
    Zhuang, Jujuan
    Meng, Yajie
    Bing, Pingping
    He, Binsheng
    Tian, Geng
    Pui, Choi Kwok
    Wu, Taoyang
    Wang, Bing
    Yang, Jialiang
    [J]. COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 146
  • [7] Imputing dropouts for single-cell RNA sequencing based on multi-objective optimization
    Jin, Ke
    Li, Bo
    Yan, Hong
    Zhang, Xiao-Fei
    [J]. BIOINFORMATICS, 2022, 38 (12) : 3222 - 3230
  • [8] RIA: a novel Regression-based Imputation Approach for single-cell RNA sequencing
    Bang Tran
    Duc Tran
    Hung Nguyen
    Nam Sy Vo
    Tin Nguyen
    [J]. PROCEEDINGS OF 2019 11TH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SYSTEMS ENGINEERING (KSE 2019), 2019, : 229 - 237
  • [9] Cell-specific network constructed by single-cell RNA sequencing data
    Dai, Hao
    Li, Lin
    Zeng, Tao
    Chen, Luonan
    [J]. NUCLEIC ACIDS RESEARCH, 2019, 47 (11)
  • [10] Joint gene network construction by single-cell RNA sequencing data
    Dong, Meichen
    He, Yiping
    Jiang, Yuchao
    Zou, Fei
    [J]. BIOMETRICS, 2023, 79 (02) : 915 - 925