Statistics or biology: the zero-inflation controversy about scRNA-seq data

被引:107
|
作者
Jiang, Ruochen [1 ]
Sun, Tianyi [1 ]
Song, Dongyuan [2 ]
Li, Jingyi Jessica [1 ,3 ,4 ,5 ]
机构
[1] Univ Calif Los Angeles, Dept Stat, Los Angeles, CA 90095 USA
[2] Univ Calif Los Angeles, Bioinformat Interdept PhD Program, Los Angeles, CA 90095 USA
[3] Univ Calif Los Angeles, Dept Human Genet, Los Angeles, CA 90095 USA
[4] Univ Calif Los Angeles, Dept Computat Med, Los Angeles, CA 90095 USA
[5] Univ Calif Los Angeles, Dept Stat, Los Angeles, CA 90095 USA
基金
美国国家科学基金会;
关键词
CELL GENE-EXPRESSION; SINGLE-CELL; RNA-SEQ; FATE DECISIONS; DNA; RECONSTRUCTION; AMPLIFICATION; IMPUTATION; BINDING; MODEL;
D O I
10.1186/s13059-022-02601-5
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Researchers view vast zeros in single-cell RNA-seq data differently: some regard zeros as biological signals representing no or low gene expression, while others regard zeros as missing data to be corrected. To help address the controversy, here we discuss the sources of biological and non-biological zeros; introduce five mechanisms of adding non-biological zeros in computational benchmarking; evaluate the impacts of non-biological zeros on data analysis; benchmark three input data types: observed counts, imputed counts, and binarized counts; discuss the open questions regarding non-biological zeros; and advocate the importance of transparent analysis.
引用
收藏
页数:24
相关论文
共 50 条
  • [41] Sensitivity of score tests for zero-inflation in count data
    Lee, AH
    Xiang, LM
    Fung, WK
    STATISTICS IN MEDICINE, 2004, 23 (17) : 2757 - 2769
  • [42] A score test for zero-inflation in correlated count data
    Xiang, Liming
    Lee, Andy H.
    Yau, Kelvin K. W.
    McLachlan, Geoffrey J.
    STATISTICS IN MEDICINE, 2006, 25 (10) : 1660 - 1671
  • [43] Immunopipe: A comprehensive and flexible scRNA-seq and scTCR-seq data analysis pipeline
    Wang, Panwen
    Dong, Haidong
    Yu, Yue
    Zhang, Shuwen
    Sun, Zhifu
    Kocher, Jean-Pierre A.
    Wang, Junwen
    Yi, Lin
    Li, Ying
    CANCER RESEARCH, 2024, 84 (06)
  • [44] Uncertainty versus variability: Bayesian methods for analysis of scRNA-seq data
    Huang, Yuanhua
    Sanguinetti, Guido
    CURRENT OPINION IN SYSTEMS BIOLOGY, 2021, 28
  • [45] scTPC: a novel semisupervised deep clustering model for scRNA-seq data
    Qiu, Yushan
    Yang, Lingfei
    Jiang, Hao
    Zou, Quan
    BIOINFORMATICS, 2024, 40 (05)
  • [46] In Silico Drug Repurposing in Multiple Sclerosis Using scRNA-Seq Data
    Shevtsov, Andrey
    Raevskiy, Mikhail
    Stupnikov, Alexey
    Medvedeva, Yulia
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2023, 24 (02)
  • [47] Mining alternative splicing patterns in scRNA-seq data using scASfind
    Song, Yuyao
    Parada, Guillermo
    Lee, Jimmy Tsz Hang
    Hemberg, Martin
    GENOME BIOLOGY, 2024, 25 (01):
  • [48] Imputation in Scrna-seq Data Using Supervised Deep Generative Networks
    Tang, Jianxiong
    Zou, Jianxiao
    Fan, Mei
    Tian, Qi
    Fan, Shicai
    2021 8TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS RESEARCH AND APPLICATIONS, ICBRA 2021, 2021, : 1 - 7
  • [49] A score test for zero-inflation in multilevel count data
    Moghimbeigi, Abbas
    Eshraghian, Mohammad Reza
    Mohammad, Kazem
    McArdle, Brian
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2009, 53 (04) : 1239 - 1248
  • [50] Score Tests for Zero-Inflation in Overdispersed Count Data
    Yang, Zhao
    Hardin, James W.
    Addy, Cheryl L.
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2010, 39 (11) : 2008 - 2030