Shrinkage estimation of dispersion in Negative Binomial models for RNA-seq experiments with small sample size

被引:86
|
作者
Yu, Danni [1 ,2 ]
Huber, Wolfgang [1 ]
Vitek, Olga [2 ,3 ]
机构
[1] European Mol Biol Lab, Genome Biol Unit, D-69117 Heidelberg, Germany
[2] Purdue Univ, Dept Stat, W Lafayette, IN 47907 USA
[3] Purdue Univ, Dept Comp Sci, W Lafayette, IN 47907 USA
基金
美国国家科学基金会;
关键词
DIFFERENTIAL EXPRESSION ANALYSIS; MAXIMUM-LIKELIHOOD-ESTIMATION; PARAMETER; REPRODUCIBILITY; NORMALIZATION; POWERFUL; PACKAGE;
D O I
10.1093/bioinformatics/btt143
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: RNA-seq experiments produce digital counts of reads that are affected by both biological and technical variation. To distinguish the systematic changes in expression between conditions from noise, the counts are frequently modeled by the Negative Binomial distribution. However, in experiments with small sample size, the per-gene estimates of the dispersion parameter are unreliable. Method: We propose a simple and effective approach for estimating the dispersions. First, we obtain the initial estimates for each gene using the method of moments. Second, the estimates are regularized, i.e. shrunk towards a common value that minimizes the average squared difference between the initial estimates and the shrinkage estimates. The approach does not require extra modeling assumptions, is easy to compute and is compatible with the exact test of differential expression. Results: We evaluated the proposed approach using 10 simulated and experimental datasets and compared its performance with that of currently popular packages edgeR, DESeq, baySeq, BBSeq and SAMseq. For these datasets, sSeq performed favorably for experiments with small sample size in sensitivity, specificity and computational time.
引用
收藏
页码:1275 / 1282
页数:8
相关论文
共 50 条
  • [31] MULTISTAGE ESTIMATION COMPARED WITH FIXED-SAMPLE-SIZE ESTIMATION OF THE NEGATIVE BINOMIAL PARAMETER-K
    WILLSON, LJ
    FOLKS, JL
    YOUNG, JH
    [J]. BIOMETRICS, 1984, 40 (01) : 109 - 117
  • [32] Sample size calculation based on exact test for assessing differential expression analysis in RNA-seq data
    Chung-I Li
    Pei-Fang Su
    Yu Shyr
    [J]. BMC Bioinformatics, 14
  • [33] Sample size calculation based on exact test for assessing differential expression analysis in RNA-seq data
    Li, Chung-I
    Su, Pei-Fang
    Shyr, Yu
    [J]. BMC BIOINFORMATICS, 2013, 14
  • [34] scHinter: imputing dropout events for single-cell RNA-seq data with limited sample size
    Ye, Pengchao
    Ye, Wenbin
    Ye, Congting
    Li, Shuchao
    Ye, Lishan
    Ji, Guoli
    Wu, Xiaohui
    [J]. BIOINFORMATICS, 2020, 36 (03) : 789 - 797
  • [35] Novel Application of Beta-binomial Models to Assess X Chromosome Inactivation Patterns in RNA-Seq Expression of Ovarian Tumors
    Larson, Nicholas B.
    Winham, Stacey
    Fogarty, Zach
    Larson, Melissa
    Fridley, Brooke
    Goode, Ellen L.
    [J]. GENETIC EPIDEMIOLOGY, 2015, 39 (07) : 562 - 563
  • [36] Clustering of Small-Sample Single-Cell RNA-Seq Data via Feature Clustering and Selection
    Vans, Edwin
    Sharma, Alok
    Patil, Ashwini
    Shigemizu, Daichi
    Tsunoda, Tatsuhiko
    [J]. PRICAI 2019: TRENDS IN ARTIFICIAL INTELLIGENCE, PT III, 2019, 11672 : 445 - 456
  • [37] Dispersion Estimation and Its Effect on Test Performance in RNA-seq Data Analysis: A Simulation-Based Comparison of Methods
    Landau, William Michael
    Liu, Peng
    [J]. PLOS ONE, 2013, 8 (12):
  • [38] dropEst: pipeline for accurate estimation of molecular counts in droplet-based single-cell RNA-seq experiments
    Petukhov, Viktor
    Guo, Jimin
    Baryawno, Ninib
    Severe, Nicolas
    Scadden, David T.
    Samsonova, Maria G.
    Kharchenko, Peter, V
    [J]. GENOME BIOLOGY, 2018, 19
  • [39] A Novel Bayesian Outlier Score Based on the Negative Binomial Distribution for Detecting Aberrantly Expressed Genes in RNA-Seq Gene Expression Count Data
    Salkovic, Edin
    Bensmail, Halima
    [J]. IEEE ACCESS, 2021, 9 : 75789 - 75800
  • [40] Sample size re-estimation for clinical trials with longitudinal negative binomial counts including time trends
    Asendorf, Thomas
    Henderson, Robin
    Schmidli, Heinz
    Friede, Tim
    [J]. STATISTICS IN MEDICINE, 2019, 38 (09) : 1503 - 1528