Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression

被引:0
|
作者
Christoph Hafemeister
Rahul Satija
机构
[1] New York Genome Center,
[2] Center for Genomics and Systems Biology,undefined
[3] New York University,undefined
来源
关键词
Single-cell RNA-seq; Normalization;
D O I
暂无
中图分类号
学科分类号
摘要
Single-cell RNA-seq (scRNA-seq) data exhibits significant cell-to-cell variation due to technical factors, including the number of molecules detected in each cell, which can confound biological heterogeneity with technical effects. To address this, we present a modeling framework for the normalization and variance stabilization of molecular count data from scRNA-seq experiments. We propose that the Pearson residuals from “regularized negative binomial regression,” where cellular sequencing depth is utilized as a covariate in a generalized linear model, successfully remove the influence of technical characteristics from downstream analyses while preserving biological heterogeneity. Importantly, we show that an unconstrained negative binomial model may overfit scRNA-seq data, and overcome this by pooling information across genes with similar abundances to obtain stable parameter estimates. Our procedure omits the need for heuristic steps including pseudocount addition or log-transformation and improves common downstream analytical tasks such as variable gene selection, dimensional reduction, and differential expression. Our approach can be applied to any UMI-based scRNA-seq dataset and is freely available as part of the R package sctransform, with a direct interface to our single-cell toolkit Seurat.
引用
收藏
相关论文
共 50 条
  • [21] Transcriptome size matters for single-cell RNA-seq normalization and bulk deconvolution
    Lu, Songjian
    Yang, Jiyuan
    Yan, Lei
    Liu, Jingjing
    Wang, Judy Jiaru
    Jain, Rhea
    Yu, Jiyang
    NATURE COMMUNICATIONS, 2025, 16 (01)
  • [22] Adaptive total variation constraint hypergraph regularized NMF for single-cell RNA-seq data analysis
    Zhu, Ya-Li
    Zhang, Xiao-Ning
    Wang, Chuan-Yuan
    Liu, Jin-Xing
    Kong, Xiang-Zhen
    QUANTITATIVE BIOLOGY, 2021, 9 (04) : 451 - 462
  • [23] Using neural networks for reducing the dimensions of single-cell RNA-Seq data
    Lin, Chieh
    Jain, Siddhartha
    Kim, Hannah
    Bar-Joseph, Ziv
    NUCLEIC ACIDS RESEARCH, 2017, 45 (17)
  • [24] Computational analysis of alternative polyadenylation from standard RNA-seq and single-cell RNA-seq data
    Gao, Yipeng
    Li, Wei
    MRNA 3' END PROCESSING AND METABOLISM, 2021, 655 : 225 - 243
  • [25] Analysis of Single-Cell RNA-seq Data by Clustering Approaches
    Zhu, Xiaoshu
    Li, Hong-Dong
    Guo, Lilu
    Wu, Fang-Xiang
    Wang, Jianxin
    CURRENT BIOINFORMATICS, 2019, 14 (04) : 314 - 322
  • [26] Evaluating imputation methods for single-cell RNA-seq data
    Yi Cheng
    Xiuli Ma
    Lang Yuan
    Zhaoguo Sun
    Pingzhang Wang
    BMC Bioinformatics, 24
  • [27] Bayesian Analysis of RNA-Seq Data Using a Family of Negative Binomial Models
    Zhao, Lili
    Wu, Weisheng
    Feng, Dai
    Jiang, Hui
    Nguyen, XuanLong
    BAYESIAN ANALYSIS, 2018, 13 (02): : 411 - 436
  • [28] Clustering and visualization of single-cell RNA-seq data using path metrics
    Manousidaki, Andriana
    Little, Anna
    Xie, Yuying
    PLOS COMPUTATIONAL BIOLOGY, 2024, 20 (05)
  • [29] Differential expression of single-cell RNA-seq data using Tweedie models
    Mallick, Himel
    Chatterjee, Suvo
    Chowdhury, Shrabanti
    Chatterjee, Saptarshi
    Rahnavard, Ali
    Hicks, Stephanie C.
    STATISTICS IN MEDICINE, 2022, 41 (18) : 3492 - 3510
  • [30] Adaptive total variation constraint hypergraph regularized NMF for single-cell RNA-seq data analysis
    Ya-Li Zhu
    Xiao-Ning Zhang
    Chuan-Yuan Wang
    Jin-Xing Liu
    Xiang-Zhen Kong
    Quantitative Biology, 2021, 9 (04) : 451 - 462