Sample size calculations for the differential expression analysis of RNA-seq data using a negative binomial regression model

被引:4
|
作者
Li, Xiaohong [1 ,2 ]
Wu, Dongfeng [1 ]
Cooper, Nigel G. F. [2 ]
Rai, Shesh N. [1 ]
机构
[1] Univ Louisville, Sch Publ Hlth & Informat Sci, Dept Bioinformat & Biostat, Louisville, KY 40202 USA
[2] Univ Louisville, Sch Med, Dept Anat Sci & Neurobiol, Louisville, KY 40292 USA
基金
美国国家卫生研究院;
关键词
a Wald test; differentially expressed genes (DEGs); power analysis; RNA-seq; sample size; SIGNATURE;
D O I
10.1515/sagmb-2018-0021
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
High throughput RNA sequencing (RNA-seq) technology is increasingly used in disease-related biomarker studies. A negative binomial distribution has become the popular choice for modeling read counts of genes in RNA-seq data due to over-dispersed read counts. In this study, we propose two explicit sample size calculation methods for RNA-seq data using a negative binomial regression model. To derive these new sample size formulas, the common dispersion parameter and the size factor as an offset via a natural logarithm link function are incorporated. A two-sided Wald test statistic derived from the coefficient parameter is used for testing a single gene at a nominal significance level 0.05 and multiple genes at a false discovery rate 0.05. The variance for the Wald test is computed from the variance-covariance matrix with the parameters estimated from the maximum likelihood estimates under the unrestricted and constrained scenarios. The performance and a sideby-side comparison of our new formulas with three existing methods with a Wald test, a likelihood ratio test or an exact test are evaluated via simulation studies. Since other methods are much computationally extensive, we recommend our M1 method for quick and direct estimation of sample sizes in an experimental design. Finally, we illustrate sample sizes estimation using an existing breast cancer RNA-seq data.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] Negative binomial additive model for RNA-Seq data analysis
    Xu Ren
    Pei-Fen Kuan
    [J]. BMC Bioinformatics, 21
  • [2] Negative binomial additive model for RNA-Seq data analysis
    Ren Xu
    Kuan Pei-Fen
    [J]. BMC BIOINFORMATICS, 2020, 21 (01)
  • [3] Power analysis and sample size estimation for RNA-Seq differential expression
    Ching, Travers
    Huang, Sijia
    Garmire, Lana X.
    [J]. RNA, 2014, 20 (11) : 1684 - 1696
  • [4] The NBP Negative Binomial Model for Assessing Differential Gene Expression from RNA-Seq
    Di, Yanming
    Schafer, Daniel W.
    Cumbie, Jason S.
    Chang, Jeff H.
    [J]. STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY, 2011, 10 (01)
  • [5] Bayesian Analysis of RNA-Seq Data Using a Family of Negative Binomial Models
    Zhao, Lili
    Wu, Weisheng
    Feng, Dai
    Jiang, Hui
    Nguyen, XuanLong
    [J]. BAYESIAN ANALYSIS, 2018, 13 (02): : 411 - 436
  • [6] Sample size calculation based on exact test for assessing differential expression analysis in RNA-seq data
    Chung-I Li
    Pei-Fang Su
    Yu Shyr
    [J]. BMC Bioinformatics, 14
  • [7] Sample size calculation based on exact test for assessing differential expression analysis in RNA-seq data
    Li, Chung-I
    Su, Pei-Fang
    Shyr, Yu
    [J]. BMC BIOINFORMATICS, 2013, 14
  • [8] Sample size calculation based on generalized linear models for differential expression analysis in RNA-seq data
    Li, Chung-I
    Shyr, Yu
    [J]. STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY, 2016, 15 (06) : 491 - 505
  • [9] NBLDA: negative binomial linear discriminant analysis for RNA-Seq data
    Kai Dong
    Hongyu Zhao
    Tiejun Tong
    Xiang Wan
    [J]. BMC Bioinformatics, 17
  • [10] NBLDA: negative binomial linear discriminant analysis for RNA-Seq data
    Dong, Kai
    Zhao, Hongyu
    Tong, Tiejun
    Wan, Xiang
    [J]. BMC BIOINFORMATICS, 2016, 17