Theoretical framework for the difference of two negative binomial distributions and its application in comparative analysis of sequencing data

被引:1
|
作者
Petrany, Alicia [1 ]
Chen, Ruoyu [2 ]
Zhang, Shaoqiang [3 ]
Chen, Yong [1 ]
机构
[1] Rowan Univ, Dept Biol & Biomed Sci, Glassboro, NJ 08028 USA
[2] Moorestown High Sch, Moorestown, NJ 08057 USA
[3] Tianjin Normal Univ, Coll Comp & Informat Engn, Tianjin 300387, Peoples R China
基金
美国国家科学基金会; 中国国家自然科学基金;
关键词
RNA-SEQ; CELL; GENOME; MODELS; MAP;
D O I
10.1101/gr.278843.123
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
High-throughput sequencing (HTS) technologies have been instrumental in investigating biological questions at the bulk and single-cell levels. Comparative analysis of two HTS data sets often relies on testing the statistical significance for the difference of two negative binomial distributions (DOTNB). Although negative binomial distributions are well studied, the theoretical results for DOTNB remain largely unexplored. Here, we derive basic analytical results for DOTNB and examine its asymptotic properties. As a state-of-the-art application of DOTNB, we introduce DEGage, a computational method for detecting differentially expressed genes (DEGs) in scRNA-seq data. DEGage calculates the mean of the sample-wise differences of gene expression levels as the test statistic and determines significant differential expression by computing the P-value with DOTNB. Extensive validation using simulated and real scRNA-seq data sets demonstrates that DEGage outperforms five popular DEG analysis tools: DEGseq2, DEsingle, edgeR, Monocle3, and scDD. DEGage is robust against high dropout levels and exhibits superior sensitivity when applied to balanced and imbalanced data sets, even with small sample sizes. We utilize DEGage to analyze prostate cancer scRNA-seq data sets and identify marker genes for 17 cell types. Furthermore, we apply DEGage to scRNA-seq data sets of mouse neurons with and without fear memory and reveal eight potential memory-related genes overlooked in previous analyses. The theoretical results and supporting software for DOTNB can be widely applied to comparative analyses of dispersed count data in HTS and broad research questions.
引用
收藏
页码:1636 / 1650
页数:15
相关论文
共 50 条
  • [31] Non-negative two-dimensional principal component analysis and its application to face recognition
    Yan, Hui
    Jin, Zhong
    Yang, Jing-Yu
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2009, 22 (06): : 809 - 814
  • [32] Data-Driven Theoretical Modeling of Centrifugal Step Emulsification and Its Application in Comprehensive Multiscale Analysis
    Wang, Xin
    Cai, Xiaolu
    Wan, Chao
    Yuan, Huijuan
    Li, Shunji
    Zhang, Yiwei
    Zhao, Ran
    Qin, Yuxi
    Li, Yiwei
    Liu, Bi-Feng
    Chen, Peng
    ADVANCED SCIENCE, 2025,
  • [33] A unified classifiability analysis framework based on meta-learner and its application in spectroscopic profiling data
    Zhang, Yinsheng
    Zhang, Zhengyong
    Wang, Haiyan
    APPLIED INTELLIGENCE, 2022, 52 (08) : 8947 - 8955
  • [34] A unified classifiability analysis framework based on meta-learner and its application in spectroscopic profiling data
    Yinsheng Zhang
    Zhengyong Zhang
    Haiyan Wang
    Applied Intelligence, 2022, 52 : 8947 - 8955
  • [35] Purely Sequential and Two-Stage Fixed-Accuracy Confidence Interval Estimation Methods for Count Data from Negative Binomial Distributions in Statistical Ecology: One-Sample and Two-Sample Problems
    Mukhopadhyay, Nitis
    Banerjee, Swarnali
    SEQUENTIAL ANALYSIS-DESIGN METHODS AND APPLICATIONS, 2014, 33 (02): : 251 - 285
  • [36] A Computational Framework for Prediction and Analysis of Cancer Signaling Dynamics from RNA Sequencing Data-Application to the ErbB Receptor Signaling Pathway
    Imoto, Hiroaki
    Zhang, Suxiang
    Okada, Mariko
    CANCERS, 2020, 12 (10) : 1 - 13
  • [37] The Dual Negative Selection Algorithm Based on Pattern Recognition Receptor Theory and Its Application in Two-class Data Classification
    Zheng, Xufei
    Zhou, Yanhui
    Fang, Yonghui
    JOURNAL OF COMPUTERS, 2013, 8 (08) : 1951 - 1959
  • [38] Remotely sensed data analysis using two neural networks and its application to land cover mapping
    Murai, H
    Omatu, S
    Oe, S
    IGARSS '98 - 1998 INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, PROCEEDINGS VOLS 1-5: SENSING AND MANAGING THE ENVIRONMENT, 1998, : 406 - 408
  • [39] On Principal Component Analysis of the Convex Combination of Two Data Matrices and Its Application to Acoustic Metamaterial Filters
    Gnecco, Giorgio
    Bacigalupo, Andrea
    MACHINE LEARNING, OPTIMIZATION, AND DATA SCIENCE (LOD 2021), PT I, 2022, 13163 : 119 - 123
  • [40] The Set of Basis Functions Generated by Pearson Type IV Distributions and Its Application to Problems of Statistical Data Analysis and Quantum Mechanics
    Bogdanov, Yu. I.
    Bogdanova, N. A.
    Lukichev, V. F.
    PROCEEDINGS OF THE STEKLOV INSTITUTE OF MATHEMATICS, 2024, 324 (01) : 53 - 65