Modeling Sage data with a truncated gamma-Poisson model

被引:20
|
作者
Thygesen, Helene H. [1 ]
Zwinderman, Aeilko H. [1 ]
机构
[1] Univ Amsterdam, Acad Med Ctr, NL-1100 DD Amsterdam, Netherlands
关键词
D O I
10.1186/1471-2105-7-157
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Serial Analysis of Gene Expressions ( SAGE) produces gene expression measurements on a discrete scale, due to the finite number of molecules in the sample. This means that part of the variance in SAGE data should be understood as the sampling error in a binomial or Poisson distribution, whereas other variance sources, in particular biological variance, should be modeled using a continuous distribution function, i.e. a prior on the intensity of the Poisson distribution. One challenge is that such a model predicts a large number of genes with zero counts, which cannot be observed. Results: We present a hierarchical Poisson model with a gamma prior and three different algorithms for estimating the parameters in the model. It turns out that the rate parameter in the gamma distribution can be estimated on the basis of a single SAGE library, whereas the estimate of the shape parameter becomes unstable. This means that the number of zero counts cannot be estimated reliably. When a bivariate model is applied to two SAGE libraries, however, the number of predicted zero counts becomes more stable and in approximate agreement with the number of transcripts observed across a large number of experiments. In all the libraries we analyzed there was a small population of very highly expressed tags, typically 1% of the tags, that could not be accounted for by the model. To handle those tags we chose to augment our model with a non-parametric component. We also show some results based on a log-normal distribution instead of the gamma distribution. Conclusion: By modeling SAGE data with a hierarchical Poisson model it is possible to separate the sampling variance from the variance in gene expression. If expression levels are reported at the gene level rather than at the tag level, genes mapped to multiple tags must be kept separate, since their expression levels show a different statistical behavior. A log-normal prior provided a better fit to our data than the gamma prior, but except for a small subpopulation of tags with very high counts, the two priors are similar.
引用
收藏
页数:9
相关论文
共 50 条
  • [21] Analysis of operating characteristic curves of gamma-poisson sampling plans
    Vijayaraghavan, R.
    Loganathan, A.
    Rajagopal, K.
    American Journal of Mathematical and Management Sciences, 2007, 27 (1-2) : 163 - 177
  • [22] A Gamma-Poisson model for vertical location and frequency of buds on lodgepole pine (Pinus contorta) leaders
    Nemec, Amanda F. Linnell
    Goudie, James W.
    Parish, Roberta
    CANADIAN JOURNAL OF FOREST RESEARCH-REVUE CANADIENNE DE RECHERCHE FORESTIERE, 2010, 40 (10): : 2049 - 2058
  • [23] Bayesian sampling inspection for resubmitted lots under Gamma-poisson distribution
    Aslam, Muhammad
    Balamurali, Saminathan
    Jun, Chi-Hyuck
    Ahmad, Munir
    Research Journal of Applied Sciences, Engineering and Technology, 2012, 4 (23) : 5171 - 5176
  • [24] Closed-form Marginal Likelihood in Gamma-Poisson Matrix Factorization
    Filstroff, Louis
    Lumbreras, Alberto
    Fevotte, Cedric
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [25] Empirical Best Prediction of Small Area Means Based on a Unit-Level Gamma-Poisson Model
    Berg, Emily
    JOURNAL OF SURVEY STATISTICS AND METHODOLOGY, 2023, 11 (04) : 873 - 894
  • [26] The Gamma-Poisson model as a statistical method to determine if micro-organisms are randomly distributed in a food matrix
    Toft, N
    Innocent, GT
    Mellor, DJ
    Reid, SWJ
    FOOD MICROBIOLOGY, 2006, 23 (01) : 90 - 94
  • [27] A Left-to-Right Algorithm for Likelihood Estimation in Gamma-Poisson Factor Analysis
    Capdevila, Joan
    Cerquides, Jesus
    Torres, Jordi
    Petitjean, Francois
    Buntine, Wray
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2018, PT II, 2019, 11052 : 638 - 654
  • [28] The compound truncated Poisson Cauchy model: A descriptor for multimodal data
    Vasconcelos, Josimar M.
    Cintra, Renato J.
    Nascimento, Abraao D. C.
    Rego, Leandro C.
    JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS, 2020, 378
  • [29] Bayesian Inference for the Gamma Zero-Truncated Poisson Distribution with an Application to Real Data
    Srisuradetchai, Patchanok
    Niyomdecha, Ausaina
    SYMMETRY-BASEL, 2024, 16 (04):
  • [30] Beta-binomial/gamma-Poisson regression models for repeated counts with random parameters
    Lora, Mayra Ivanoff
    Singer, Julio M.
    BRAZILIAN JOURNAL OF PROBABILITY AND STATISTICS, 2011, 25 (02) : 218 - 235