Zero-inflated generalized Dirichlet multinomial regression model for microbiome compositional data analysis

被引:47
|
作者
Tang, Zheng-Zheng [1 ,2 ]
Chen, Guanhua [1 ]
机构
[1] Univ Wisconsin, Dept Biostat & Med Informat, Madison, WI 53715 USA
[2] Wisconsin Inst Discovery, Madison, WI 53715 USA
关键词
Compositional data analysis; Differential abundance; Hierarchical model; Microbiome; Score test; Zero-inflated model; FALSE DISCOVERY RATE; WIDE ASSOCIATION; SELECTION; FRAMEWORK;
D O I
10.1093/biostatistics/kxy025
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
There is heightened interest in using high-throughput sequencing technologies to quantify abundances of microbial taxa and linking the abundance to human diseases and traits. Proper modeling of multivariate taxon counts is essential to the power of detecting this association. Existing models are limited in handling excessive zero observations in taxon counts and in flexibly accommodating complex correlation structures and dispersion patterns among taxa. In this article, we develop a new probability distribution, zero-inflated generalized Dirichlet multinomial (ZIGDM), that overcomes these limitations in modeling multivariate taxon counts. Based on this distribution, we propose a ZIGDM regression model to link microbial abundances to covariates (e.g. disease status) and develop a fast expectation-maximization algorithm to efficiently estimate parameters in the model. The derived tests enable us to reveal rich patterns of variation in microbial compositions including differential mean and dispersion. The advantages of the proposed methods are demonstrated through simulation studies and an analysis of a gut microbiome dataset.
引用
收藏
页码:698 / 713
页数:16
相关论文
共 50 条
  • [21] Score test for testing zero-inflated Poisson regression against zero-inflated generalized Poisson alternatives
    Zamani, Hossein
    Ismail, Noriszura
    [J]. JOURNAL OF APPLIED STATISTICS, 2013, 40 (09) : 2056 - 2068
  • [22] Bagging GLM: Improved generalized linear model for the analysis of zero-inflated data
    Osawa, Takeshi
    Mitsuhashi, Hiromune
    Uematsu, Yuta
    Ushimaru, Atushi
    [J]. ECOLOGICAL INFORMATICS, 2011, 6 (05) : 270 - 275
  • [23] Zero-Inflated Poisson Regression for Longitudinal Data
    Hasan, M. Tariqul
    Sneddon, Gary
    [J]. COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2009, 38 (03) : 638 - 653
  • [24] Zero-inflated Poisson regression mixture model
    Lim, Hwa Kyung
    Li, Wai Keung
    Yu, Philip L. H.
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2014, 71 : 151 - 158
  • [25] Bayesian Analysis for the Zero-inflated Regression Models
    Jane, Hakjin
    Kang, Yunhee
    Lee, S.
    Kim, Seong W.
    [J]. KOREAN JOURNAL OF APPLIED STATISTICS, 2008, 21 (04) : 603 - 613
  • [26] Unified computational methods for regression analysis of zero-inflated and bound-inflated data
    Yang, Yan
    Simpson, Douglas
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2010, 54 (06) : 1525 - 1534
  • [27] A Zero-Inflated Model for Insurance Data
    Choi, Jong-Hoo
    Ko, In-Mi
    Cheon, Sooyoung
    [J]. KOREAN JOURNAL OF APPLIED STATISTICS, 2011, 24 (03) : 485 - 494
  • [28] Simultaneous confidence bands in a zero-inflated regression model for binary data
    Diop, Aba
    Diop, Aliou
    Dupuy, Jean-Francois
    [J]. RANDOM OPERATORS AND STOCHASTIC EQUATIONS, 2022, 30 (02) : 85 - 96
  • [29] Bayesian inference and diagnostics in zero-inflated generalized power series regression model
    Barriga, Gladys D. Cacsire
    Dey, Dipak K.
    [J]. COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2016, 45 (22) : 6553 - 6568
  • [30] The Zero-Inflated Poisson - Probit regression model: a new model for count data
    Pho, Kim-Hung
    Truong, Buu-Chau
    [J]. COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2024,