Latent Network Estimation and Variable Selection for Compositional Data Via Variational EM

被引:15
|
作者
Osborne, Nathan [1 ]
Peterson, Christine B. [2 ]
Vannucci, Marina [1 ]
机构
[1] Rice Univ, Dept Stat, Houston, TX 77251 USA
[2] Univ Texas MD Anderson Canc Ctr, Dept Biostat, Houston, TX 77030 USA
关键词
Bayesian hierarchical model; Count data; EM algorithm; Graphical model; Microbiome data; Variational inference; BAYESIAN-INFERENCE; PROBIT MODELS; REGRESSION; GRAPHS; LASSO;
D O I
10.1080/10618600.2021.1935971
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Network estimation and variable selection have been extensively studied in the statistical literature, but only recently have those two challenges been addressed simultaneously. In this article, we seek to develop a novel method to simultaneously estimate network interactions and associations to relevant covariates for count data, and specifically for compositional data, which have a fixed sum constraint. We use a hierarchical Bayesian model with latent layers and employ spike-and-slab priors for both edge and covariate selection. For posterior inference, we develop a novel variational inference scheme with an expectation-maximization step, to enable efficient estimation. Through simulation studies, we demonstrate that the proposed model outperforms existing methods in its accuracy of network recovery. We show the practical utility of our model via an application to microbiome data. The human microbiome has been shown to contribute too many of the functions of the human body, and also to be linked with a number of diseases. In our application, we seek to better understand the interaction between microbes and relevant covariates, as well as the interaction of microbes with each other. We call our algorithm simultaneous inference for networks and covariates and provide a Python implementation, which is available online.
引用
收藏
页码:163 / 175
页数:13
相关论文
共 50 条
  • [31] ESTIMATION AND APPLICATION OF LATENT VARIABLE MODELS IN CATEGORICAL-DATA ANALYSIS
    LEUNG, SO
    BRITISH JOURNAL OF MATHEMATICAL & STATISTICAL PSYCHOLOGY, 1992, 45 : 311 - 328
  • [32] An improved EM algorithm for variable selection with missing data in spatial error models
    Wang, Yuanfeng
    Song, Yunquan
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2025,
  • [33] Estimation of missing data using latent variable methods with auxiliary information
    Muteki, K
    MacGregor, JF
    Ueda, T
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2005, 78 (1-2) : 41 - 50
  • [34] Direct interaction network inference for compositional data via codaloss
    Chen, Liang
    He, Shun
    Zhai, Yuyao
    Deng, Minghua
    JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2020, 18 (06)
  • [35] BULK COMPOSITIONAL VARIATIONS OF COAL ASPHALTENES STUDIED BY LATENT VARIABLE ANALYSIS OF INFRARED MICROSPECTROSCOPIC DATA
    CHRISTY, AA
    KVALHEIM, OM
    HOILAND, H
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 1994, 23 (01) : 197 - 204
  • [36] Variable selection in regression via repeated data splitting
    Thall, PF
    Russell, KE
    Simon, RM
    MINING AND MODELING MASSIVE DATA SETS IN SCIENCE, ENGINEERING, AND BUSINESS WITH A SUBTHEME IN ENVIRONMENTAL STATISTICS, 1997, 29 (01): : 545 - 545
  • [37] VARIABLE SELECTION AND COEFFICIENT ESTIMATION VIA REGULARIZED RANK REGRESSION
    Leng, Chenlei
    STATISTICA SINICA, 2010, 20 (01) : 167 - 181
  • [38] Variable selection in regression via repeated data splitting
    Thall, PF
    Russell, KE
    Simon, RM
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 1997, 6 (04) : 416 - 434
  • [39] Variable Selection for Propensity Score Estimation via Balancing Covariates
    Zhu, Yeying
    Schonbach, Maya
    Coffman, Donna L.
    Williams, Jennifer S.
    EPIDEMIOLOGY, 2015, 26 (02) : E14 - E15
  • [40] Optimized variable selection via repeated data splitting
    Capanu, Marinela
    Giurcanu, Mihai
    Begg, Colin B.
    Gonen, Mithat
    STATISTICS IN MEDICINE, 2020, 39 (16) : 2167 - 2184