A Bayesian analysis of frequency count data

被引:1
|
作者
Font, M. [1 ]
Puig, X. [1 ]
Ginebra, J. [1 ]
机构
[1] Tech Univ Catalonia, Dept Stat, Barcelona 08028, Spain
关键词
diversity; inverse Gaussian; Poisson mixture; population size; Sichel model; species frequency; textual data; vocabulary distribution; LITERARY-STYLE; DIVERSITY;
D O I
10.1080/00949655.2011.600311
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The zero truncated inverse GaussianPoisson model, obtained by first mixing the Poisson model assuming its expected value has an inverse Gaussian distribution and then truncating the model at zero, is very useful when modelling frequency count data. A Bayesian analysis based on this statistical model is implemented on the word frequency counts of various texts, and its validity is checked by exploring the posterior distribution of the Pearson errors and by implementing posterior predictive consistency checks. The analysis based on this model is useful because it allows one to use the posterior distribution of the model mixing density as an approximation of the posterior distribution of the density of the word frequencies of the vocabulary of the author, which is useful to characterize the style of that author. The posterior distribution of the expectation and of measures of the variability of that mixing distribution can be used to assess the size and diversity of his vocabulary. An alternative analysis is proposed based on the inverse Gaussian-zero truncated Poisson mixture model, which is obtained by switching the order of the mixing and the truncation stages. Even though this second model fits some of the word frequency data sets more accurately than the first model, in practice the analysis based on it is not as useful because it does not allow one to estimate the word frequency distribution of the vocabulary.
引用
收藏
页码:229 / 246
页数:18
相关论文
共 50 条
  • [1] Bayesian analysis of the differences of count data
    Karlis, D
    Ntzoufras, I
    [J]. STATISTICS IN MEDICINE, 2006, 25 (11) : 1885 - 1905
  • [2] Sequential Bayesian Analysis of Multivariate Count Data
    Aktekin, Tevfik
    Polson, Nick
    Soyer, Refik
    [J]. BAYESIAN ANALYSIS, 2018, 13 (02): : 385 - 409
  • [3] Bayesian Correlation Analysis for Sequence Count Data
    Sanchez-Taltavull, Daniel
    Ramachandran, Parameswaran
    Lau, Nelson
    Perkins, Theodore J.
    [J]. PLOS ONE, 2016, 11 (10):
  • [4] Bayesian analysis of econometric models for count data: A survey
    Winkelmann, R
    [J]. EXPLORATORY DATA ANALYSIS IN EMPIRICAL RESEARCH, PROCEEDINGS, 2003, : 204 - 215
  • [5] A Bayesian longitudinal trend analysis of count data with Gaussian processes
    VanSchalkwyk, Samantha
    Jeske, Daniel R.
    Kim, Jane H.
    Martins-Green, Manuela
    [J]. BIOMETRICAL JOURNAL, 2022, 64 (01) : 74 - 90
  • [6] A Bayesian approach for semiparametric regression analysis of panel count data
    Wang, Jianhong
    Lin, Xiaoyan
    [J]. LIFETIME DATA ANALYSIS, 2020, 26 (02) : 402 - 420
  • [7] Bayesian Semiparametric Regression Analysis of Multivariate Panel Count Data
    Wang, Chunling
    Lin, Xiaoyan
    [J]. STATS, 2022, 5 (02): : 477 - 493
  • [8] A Bayesian approach for semiparametric regression analysis of panel count data
    Jianhong Wang
    Xiaoyan Lin
    [J]. Lifetime Data Analysis, 2020, 26 : 402 - 420
  • [9] Empirical Analysis of Bayesian Kernel Methods for Modeling Count Data
    Floyd, Molly Stam
    Baroud, Hiba
    Barker, Kash
    [J]. 2014 SYSTEMS AND INFORMATION ENGINEERING DESIGN SYMPOSIUM (SIEDS), 2014,
  • [10] Bayesian Gaussian process factor analysis with copula for count data
    Pirs, Gregor
    Strumbelj, Erik
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2022, 197