Dealing with overdispersion in multivariate count data

被引:4
|
作者
Corsini, Noemi [1 ]
Viroli, Cinzia [1 ]
机构
[1] Univ Bologna, Dept Stat Sci, via Belle Arti 41, I-40126 Bologna, Italy
关键词
Extra-variation; Mixture models; Deep learning; Maximum likelihood; FINITE MIXTURE DISTRIBUTION; ZERO-INFLATED POISSON; REGRESSION; MODEL;
D O I
10.1016/j.csda.2022.107447
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The problem of overdispersion in multivariate count data is a challenging issue. It covers a central role mainly due to the relevance of modern technology-based data, such as Next Generation Sequencing and textual data from the web or digital collections. A comprehensive analysis of the likelihood-based models for extra-variation data is presented. Particular attention is paid to the models feasible for high-dimensional data. A new approach together with its parametric-estimation procedure is proposed. It can be viewed as a deeper version of the Dirichlet-Multinomial distribution and it leads to important results allowing to get a better approximation of the observed variability. A significative comparison of the proposed model and existing strategies is made through two different simulation studies and an empirical data set, that confirm a better capability to describe overdispersion. (C) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] Splitting models for multivariate count data
    Peyhardi, Jean
    Fernique, Pierre
    Durand, Jean-Baptiste
    [J]. JOURNAL OF MULTIVARIATE ANALYSIS, 2021, 181
  • [32] Local influence diagnostics for hierarchical count data models with overdispersion and excess zeros
    Rakhmawati, Trias Wahyuni
    Molenberghs, Geert
    Verbeke, Geert
    Faes, Christel
    [J]. BIOMETRICAL JOURNAL, 2016, 58 (06) : 1390 - 1408
  • [33] Diagnostic tools for a multivariate negative binomial model for fitting correlated data with overdispersion
    Fabio, Lizandra C.
    Villegas, Cristian
    Carrasco, Jalmar M. F.
    de Castro, Mario
    [J]. COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2023, 52 (06) : 1833 - 1853
  • [34] Random Forests in Count Data Modelling: An Analysis of the Influence of Data Features and Overdispersion on Regression Performance
    Mushagalusa, Ciza Arsene
    Fandohan, Adande Belarmain
    Glele Kakai, Romain
    [J]. JOURNAL OF PROBABILITY AND STATISTICS, 2022, 2022
  • [35] overdisp: an R package for direct detection of overdispersion in count data multiple regression analysis
    de Freitas Souza, Rafael
    Fávero, Luiz Paulo
    Belfiore, Patrícia
    Corrêa, Hamilton Luiz
    [J]. International Journal of Business Intelligence and Data Mining, 2022, 20 (03) : 327 - 344
  • [36] Population trends from count data: Handling environmental bias, overdispersion and excess of zeroes
    Tirozzi, Pietro
    Orioli, Valerio
    Dondina, Olivia
    Kataoka, Leila
    Bani, Luciano
    [J]. ECOLOGICAL INFORMATICS, 2022, 69
  • [37] Score tests for zero-inflation and overdispersion in two-level count data
    Lim, Hwa Kyung
    Song, Juwon
    Jung, Byoung Cheol
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2013, 61 : 67 - 82
  • [38] Overdispersion tests in count-data analysis (vol 103, pg 145, 2008)
    Vives, J.
    Losilla, J-M
    Rodrigo, M-F
    Portell, M.
    Llorens, M.
    [J]. PSYCHOLOGICAL REPORTS, 2013, 113 (02) : 683 - 683
  • [39] Latent feature regression for multivariate count data
    Klami, Arto
    Tripathi, Abhishek
    Sirola, Johannes
    Vare, Lauri
    Roulland, Frederic
    [J]. ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 38, 2015, 38 : 462 - 470
  • [40] Multivariate Areal Interpolation for Continuous and Count Data
    Krivoruchko, Konstantin
    Gribov, Alexander
    Krause, Eric
    [J]. 1ST CONFERENCE ON SPATIAL STATISTICS 2011 - MAPPING GLOBAL CHANGE, 2011, 3 : 14 - 19