Statistics or biology: the zero-inflation controversy about scRNA-seq data

被引:107
|
作者
Jiang, Ruochen [1 ]
Sun, Tianyi [1 ]
Song, Dongyuan [2 ]
Li, Jingyi Jessica [1 ,3 ,4 ,5 ]
机构
[1] Univ Calif Los Angeles, Dept Stat, Los Angeles, CA 90095 USA
[2] Univ Calif Los Angeles, Bioinformat Interdept PhD Program, Los Angeles, CA 90095 USA
[3] Univ Calif Los Angeles, Dept Human Genet, Los Angeles, CA 90095 USA
[4] Univ Calif Los Angeles, Dept Computat Med, Los Angeles, CA 90095 USA
[5] Univ Calif Los Angeles, Dept Stat, Los Angeles, CA 90095 USA
基金
美国国家科学基金会;
关键词
CELL GENE-EXPRESSION; SINGLE-CELL; RNA-SEQ; FATE DECISIONS; DNA; RECONSTRUCTION; AMPLIFICATION; IMPUTATION; BINDING; MODEL;
D O I
10.1186/s13059-022-02601-5
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Researchers view vast zeros in single-cell RNA-seq data differently: some regard zeros as biological signals representing no or low gene expression, while others regard zeros as missing data to be corrected. To help address the controversy, here we discuss the sources of biological and non-biological zeros; introduce five mechanisms of adding non-biological zeros in computational benchmarking; evaluate the impacts of non-biological zeros on data analysis; benchmark three input data types: observed counts, imputed counts, and binarized counts; discuss the open questions regarding non-biological zeros; and advocate the importance of transparent analysis.
引用
收藏
页数:24
相关论文
共 50 条
  • [21] Contrastive self-supervised clustering of scRNA-seq data
    Ciortan, Madalina
    Defrance, Matthieu
    BMC BIOINFORMATICS, 2021, 22 (01)
  • [22] scVIC: deep generative modeling of heterogeneity for scRNA-seq data
    Xiong, Jiankang
    Gong, Fuzhou
    Ma, Liang
    Wan, Lin
    BIOINFORMATICS ADVANCES, 2024, 4 (01):
  • [23] GNN-based embedding for clustering scRNA-seq data
    Ciortan, Madalina
    Defrance, Matthieu
    BIOINFORMATICS, 2022, 38 (04) : 1037 - 1044
  • [24] scDSSC: Deep Sparse Subspace Clustering for scRNA-seq Data
    Wang, HaiYun
    Zhao, JianPing
    Zheng, ChunHou
    Su, YanSen
    PLOS COMPUTATIONAL BIOLOGY, 2022, 18 (12)
  • [25] Iterative point set registration for aligning scRNA-seq data
    Alavi, Amir
    Bar-Joseph, Ziv
    PLOS COMPUTATIONAL BIOLOGY, 2020, 16 (10)
  • [26] Intrinsic entropy model for feature selection of scRNA-seq data
    Li, Lin
    Tang, Hui
    Xia, Rui
    Dai, Hao
    Liu, Rui
    Chen, Luonan
    JOURNAL OF MOLECULAR CELL BIOLOGY, 2022, 14 (02)
  • [27] Scdrake: a reproducible and scalable pipeline for scRNA-seq data analysis
    Kubovciak, Jan
    Kolar, Michal
    Novotny, Jiri
    BIOINFORMATICS ADVANCES, 2023, 3 (01):
  • [28] Contrastive self-supervised clustering of scRNA-seq data
    Madalina Ciortan
    Matthieu Defrance
    BMC Bioinformatics, 22
  • [29] Cell lineage inference from SNP and scRNA-Seq data
    Ding, Jun
    Lin, Chieh
    Bar-Joseph, Ziv
    NUCLEIC ACIDS RESEARCH, 2019, 47 (10)
  • [30] SPARSim single cell: a count data simulator for scRNA-seq data
    Baruzzo, Giacomo
    Patuzzi, Ilaria
    Di Camillo, Barbara
    BIOINFORMATICS, 2020, 36 (05) : 1468 - 1475