A Bayesian zero-inflated Dirichlet-multinomial regression model for multivariate compositional count data

被引:2
|
作者
Koslovsky, Matthew D. D. [1 ]
机构
[1] Colorado State Univ, Dept Stat, Ft Collins, CO 80523 USA
基金
美国国家科学基金会;
关键词
data augmentation; microbiome; sparse; variable selection; zero-inflation; VARIABLE SELECTION; MICROBIOME DATA; ASSOCIATIONS;
D O I
10.1111/biom.13853
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The Dirichlet-multinomial (DM) distribution plays a fundamental role in modern statistical methodology development and application. Recently, the DM distribution and its variants have been used extensively to model multivariate count data generated by high-throughput sequencing technology in omics research due to its ability to accommodate the compositional structure of the data as well as overdispersion. A major limitation of the DM distribution is that it is unable to handle excess zeros typically found in practice which may bias inference. To fill this gap, we propose a novel Bayesian zero-inflated DM model for multivariate compositional count data with excess zeros. We then extend our approach to regression settings and embed sparsity-inducing priors to perform variable selection for high-dimensional covariate spaces. Throughout, modeling decisions are made to boost scalability without sacrificing interpretability or imposing limiting assumptions. Extensive simulations and an application to a human gut microbiome dataset are presented to compare the performance of the proposed method to existing approaches. We provide an accompanying R package with a user-friendly vignette to apply our method to other datasets.
引用
收藏
页码:3239 / 3251
页数:13
相关论文
共 50 条
  • [1] Zero-inflated generalized Dirichlet multinomial regression model for microbiome compositional data analysis
    Tang, Zheng-Zheng
    Chen, Guanhua
    [J]. BIOSTATISTICS, 2019, 20 (04) : 698 - 713
  • [2] Infants' gut microbiome data: A Bayesian Marginal Zero-inflated Negative Binomial regression model for multivariate analyses of count data
    Hajihosseini, Morteza
    Amini, Payam
    Saidi-Mehrabad, Alireza
    Dinu, Irina
    [J]. COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2023, 21 : 1621 - 1629
  • [3] The analysis of zero-inflated count data: Beyond zero-inflated Poisson regression.
    Loeys, Tom
    Moerkerke, Beatrijs
    De Smet, Olivia
    Buysse, Ann
    [J]. BRITISH JOURNAL OF MATHEMATICAL & STATISTICAL PSYCHOLOGY, 2012, 65 (01): : 163 - 180
  • [4] Bayesian semiparametric zero-inflated Poisson model for longitudinal count data
    Dagne, Getachew A.
    [J]. MATHEMATICAL BIOSCIENCES, 2010, 224 (02) : 126 - 130
  • [5] Clustering multivariate count data via Dirichlet-multinomial network fusion
    Zhao, Xin
    Zhang, Jingru
    Lin, Wei
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2023, 179
  • [6] Marginal zero-inflated regression models for count data
    Martin, Jacob
    Hall, Daniel B.
    [J]. JOURNAL OF APPLIED STATISTICS, 2017, 44 (10) : 1807 - 1826
  • [7] Zero-inflated Bell regression models for count data
    Lemonte, Artur J.
    Moreno-Arenas, German
    Castellares, Fredy
    [J]. JOURNAL OF APPLIED STATISTICS, 2020, 47 (02) : 265 - 286
  • [8] A bivariate zero-inflated count data regression model with unrestricted correlation
    Gurmu, Shiferaw
    Elder, John
    [J]. ECONOMICS LETTERS, 2008, 100 (02) : 245 - 248
  • [9] A Bayesian nonparametric analysis for zero-inflated multivariate count data with application to microbiome study
    Shuler, Kurtis
    Verbanic, Samuel
    Chen, Irene A.
    Lee, Juhee
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS, 2021, 70 (04) : 961 - 979
  • [10] Bayesian variable selection for multivariate zero-inflated models: Application to microbiome count data
    Lee, Kyu Ha
    Coull, Brent A.
    Moscicki, Anna-Barbara
    Paster, Bruce J.
    Starr, Jacqueline R.
    [J]. BIOSTATISTICS, 2020, 21 (03) : 499 - 517