BAYESIAN MODELING OF INTERACTION BETWEEN FEATURES IN SPARSE MULTIVARIATE COUNT DATA WITH APPLICATION TO MICROBIOME STUDY

被引:0
|
作者
Zhang, Shuangjie [1 ]
Shen, Yuning [2 ]
Chen, Irene A. [2 ]
Lee, Juhee [1 ]
机构
[1] Univ Calif Santa Cruz, Dept Stat, Santa Cruz, CA 95064 USA
[2] Univ Calif Los Angeles, Dept Chem & Biomol Engn, Los Angeles, CA USA
来源
ANNALS OF APPLIED STATISTICS | 2023年 / 17卷 / 03期
关键词
Covariance matrix; differential abundance; factor model; joint sparsity; kernel model; zero inflation; multivariate count data; MULTINOMIAL REGRESSION-MODEL; POSTERIOR CONTRACTION; COMPOSITIONAL DATA; COVARIANCE; RATES;
D O I
10.1214/22-AOAS1690
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Many statistical methods have been developed for the analysis of microbial community profiles, but due to the complexity of typical microbiome measurements, inference of interactions between microbial features remains challenging. We develop a Bayesian zero-inflated rounded log-normal kernel method to model interaction between microbial features in a community using multivariate count data in the presence of covariates and excess zeros. The model carefully constructs the interaction structure by imposing joint sparsity on the covariance matrix of the kernel and obtains a reliable estimate of the structure with a small sample size. The model also includes zero inflation to account for excess zeros observed in data and infers differential abundance of microbial features associated with covariates through log-linear regression. We provide simulation studies and real data analysis examples to demonstrate the developed model. Comparison of the model to a simpler model and popular alternatives in simulation studies shows that, in addition to an added and important insight on the feature interaction, it yields superior parameter estimates and model fit in various settings.
引用
收藏
页码:1861 / 1883
页数:23
相关论文
共 50 条
  • [1] A Bayesian nonparametric analysis for zero-inflated multivariate count data with application to microbiome study
    Shuler, Kurtis
    Verbanic, Samuel
    Chen, Irene A.
    Lee, Juhee
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS, 2021, 70 (04) : 961 - 979
  • [2] Bayesian Modeling on Microbiome Data Analysis: Application to Subgingival Microbiome Study
    Gwon, Yeongjin
    Yu, Fang
    Payne, Jeffrey B.
    Mikuls, Ted R.
    [J]. STATISTICS IN BIOSCIENCES, 2023,
  • [3] Bayesian variable selection for multivariate zero-inflated models: Application to microbiome count data
    Lee, Kyu Ha
    Coull, Brent A.
    Moscicki, Anna-Barbara
    Paster, Bruce J.
    Starr, Jacqueline R.
    [J]. BIOSTATISTICS, 2020, 21 (03) : 499 - 517
  • [4] BAYESIAN MULTIVARIATE SPARSE FUNCTIONAL PRINCIPAL COMPONENTS ANALYSIS WITH APPLICATION TO LONGITUDINAL MICROBIOME MULTIOMICS DATA
    Jiang, Lingjing
    Elrod, Chris
    Kim, Jane J.
    Swafford, Austin D.
    Knight, Rob
    Thompson, Wesley K.
    [J]. ANNALS OF APPLIED STATISTICS, 2022, 16 (04): : 2231 - 2249
  • [5] Bayesian Sparse Multivariate Regression with Asymmetric Nonlocal Priors for Microbiome Data Analysis
    Shuler, Kurtis
    Sison-Mangus, Marilou
    Lee, Juhee
    [J]. BAYESIAN ANALYSIS, 2020, 15 (02): : 559 - 578
  • [6] Bayesian Joint Modeling of Multivariate Longitudinal and Survival Data With an Application to Diabetes Study
    Huang, Yangxin
    Chen, Jiaqing
    Xu, Lan
    Tang, Nian-Sheng
    [J]. FRONTIERS IN BIG DATA, 2022, 5
  • [7] Sparse Bayesian modelling of underreported count data
    Dvorzak, Michaela
    Wagner, Helga
    [J]. STATISTICAL MODELLING, 2016, 16 (01) : 24 - 46
  • [8] Sequential Bayesian Analysis of Multivariate Count Data
    Aktekin, Tevfik
    Polson, Nick
    Soyer, Refik
    [J]. BAYESIAN ANALYSIS, 2018, 13 (02): : 385 - 409
  • [9] Bayesian inference on quasi-sparse count data
    Datta, Jyotishka
    Dunson, David B.
    [J]. BIOMETRIKA, 2016, 103 (04) : 971 - 983
  • [10] Modeling Multivariate Count Data Using Copulas
    Nikoloulopoulos, Aristidis K.
    Karlis, Dimitris
    [J]. COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2010, 39 (01) : 172 - 187