Theoretical framework for the difference of two negative binomial distributions and its application in comparative analysis of sequencing data

被引:1
|
作者
Petrany, Alicia [1 ]
Chen, Ruoyu [2 ]
Zhang, Shaoqiang [3 ]
Chen, Yong [1 ]
机构
[1] Rowan Univ, Dept Biol & Biomed Sci, Glassboro, NJ 08028 USA
[2] Moorestown High Sch, Moorestown, NJ 08057 USA
[3] Tianjin Normal Univ, Coll Comp & Informat Engn, Tianjin 300387, Peoples R China
基金
美国国家科学基金会; 中国国家自然科学基金;
关键词
RNA-SEQ; CELL; GENOME; MODELS; MAP;
D O I
10.1101/gr.278843.123
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
High-throughput sequencing (HTS) technologies have been instrumental in investigating biological questions at the bulk and single-cell levels. Comparative analysis of two HTS data sets often relies on testing the statistical significance for the difference of two negative binomial distributions (DOTNB). Although negative binomial distributions are well studied, the theoretical results for DOTNB remain largely unexplored. Here, we derive basic analytical results for DOTNB and examine its asymptotic properties. As a state-of-the-art application of DOTNB, we introduce DEGage, a computational method for detecting differentially expressed genes (DEGs) in scRNA-seq data. DEGage calculates the mean of the sample-wise differences of gene expression levels as the test statistic and determines significant differential expression by computing the P-value with DOTNB. Extensive validation using simulated and real scRNA-seq data sets demonstrates that DEGage outperforms five popular DEG analysis tools: DEGseq2, DEsingle, edgeR, Monocle3, and scDD. DEGage is robust against high dropout levels and exhibits superior sensitivity when applied to balanced and imbalanced data sets, even with small sample sizes. We utilize DEGage to analyze prostate cancer scRNA-seq data sets and identify marker genes for 17 cell types. Furthermore, we apply DEGage to scRNA-seq data sets of mouse neurons with and without fear memory and reveal eight potential memory-related genes overlooked in previous analyses. The theoretical results and supporting software for DOTNB can be widely applied to comparative analyses of dispersed count data in HTS and broad research questions.
引用
收藏
页码:1636 / 1650
页数:15
相关论文
共 50 条
  • [1] Inferences and power analysis concerning two negative binomial distributions with an application to MRI lesion counts data
    Aban, Inmaculada B.
    Cutter, Gary R.
    Mavinga, Nsoki
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2009, 53 (03) : 820 - 833
  • [2] Negative binomial factor regression with application to microbiome data analysis
    Mishra, Aditya K.
    Mueller, Christian L.
    STATISTICS IN MEDICINE, 2022, 41 (15) : 2786 - 2803
  • [3] Covariate-dependent negative binomial factor analysis of RNA sequencing data
    Dadaneh, Siamak Zamani
    Zhou, Mingyuan
    Qian, Xiaoning
    BIOINFORMATICS, 2018, 34 (13) : 61 - 69
  • [4] NEGBIN - AN ANALYSIS OF REPEAT-BUYING DATA VIA THE NEGATIVE BINOMIAL AND RELATED DISTRIBUTIONS
    WILSON, RD
    ROOD, SC
    JOURNAL OF MARKETING RESEARCH, 1980, 17 (04) : 544 - 545
  • [5] APPLICATION OF NEGATIVE BINOMIAL REGRESSION-MODELS TO THE ANALYSIS OF QUANTAL BIOASSAYS DATA
    MAUL, A
    ELSHAARAWI, AH
    FERARD, JF
    ENVIRONMETRICS, 1991, 2 (03) : 253 - 261
  • [6] NBZIMM: negative binomial and zero-inflated mixed models, with application to microbiome/metagenomics data analysis
    Xinyan Zhang
    Nengjun Yi
    BMC Bioinformatics, 21
  • [7] Deep zero-inflated negative binomial model and its application in scRNA-seq data integration
    Wei, Mingqiu
    Liu, Rongjie
    Wang, Yue Julia
    Huang, Chao
    SOUTHEASTCON 2023, 2023, : 901 - 905
  • [8] NBZIMM: negative binomial and zero-inflated mixed models, with application to microbiome/metagenomics data analysis
    Zhang, Xinyan
    Yi, Nengjun
    BMC BIOINFORMATICS, 2020, 21 (01)
  • [9] Evaluation of negative binomial and zero-inflated negative binomial models for the analysis of zero-inflated count data: application to the telemedicine for children with medical complexity trial
    Lee, Kyung Hyun
    Pedroza, Claudia
    Avritscher, Elenir B. C.
    Mosquera, Ricardo A.
    Tyson, Jon E.
    TRIALS, 2023, 24 (01)
  • [10] A framework for radial data comparison and its application to fingerprint analysis
    Marco-Detchart, C.
    Cerron, J.
    De Miguel, L.
    Lopez-Molina, C.
    Bustince, H.
    Galar, M.
    APPLIED SOFT COMPUTING, 2016, 46 : 246 - 259