BAYESIAN MULTIVARIATE SPARSE FUNCTIONAL PRINCIPAL COMPONENTS ANALYSIS WITH APPLICATION TO LONGITUDINAL MICROBIOME MULTIOMICS DATA

被引:3
|
作者
Jiang, Lingjing [1 ]
Elrod, Chris [2 ]
Kim, Jane J. [3 ]
Swafford, Austin D. [4 ]
Knight, Rob [5 ]
Thompson, Wesley K. [1 ]
机构
[1] Univ Calif San Diego, Herbert Wertheim Sch Publ Hlth & Human Longev Sci, La Jolla, CA 92093 USA
[2] Julia Comp, Boston, MA USA
[3] Univ Calif San Diego, Dept Pediat, La Jolla, CA 92093 USA
[4] Univ Calif San Diego, Ctr Microbiome Innovat, La Jolla, CA 92093 USA
[5] Univ Calif San Diego, Dept Pediat, Ctr Microbiome Innovat, Dept Comp Sci & Engn,Dept Bioengn, La Jolla, CA 92093 USA
来源
ANNALS OF APPLIED STATISTICS | 2022年 / 16卷 / 04期
关键词
1; Introduction; Numerous disorders; including heritable immune -mediated diseases; Key words and phrases; Bayesian; functional data analysis; longitudinal; microbiome; multiomics; GUT MICROBIOME; INFECTION; RATES; OMICS;
D O I
10.1214/21-AOAS1587
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Microbiome researchers often need to model the temporal dynamics of multiple complex, nonlinear outcome trajectories simultaneously. This motivates our development of multivariate Sparse Functional Principal Components Analysis (mSFPCA), extending existing SFPCA methods to simultaneously characterize multiple temporal trajectories and their interrelationships. As with existing SFPCA methods, the mSFPCA algorithm characterizes each trajectory as a smooth mean plus a weighted combination of the smooth major modes of variation about the mean, where the weights are given by the component scores for each subject. Unlike existing SFPCA methods, the mSFPCA algorithm allows estimation of multiple trajectories simultaneously, such that the component scores, which are constrained to be independent within a particular outcome for identifiability, may be arbitrarily correlated with component scores for other outcomes. A Cholesky decomposition is used to estimate the component score covariance matrix efficiently and guarantee positive semidefiniteness given these constraints. Mutual information is used to assess the strength of marginal and conditional temporal associations across outcome trajectories. Importantly, we implement mSFPCA as a Bayesian algorithm using R and stan, enabling easy use of packages such as PSIS-LOO for model selection and graphical posterior predictive checks to assess the validity of mSFPCA models. Although we focus on application of mSFPCA to microbiome data in this paper, the mSFPCA model is of general utility and can be used in a wide range of real-world applications.
引用
收藏
页码:2231 / 2249
页数:19
相关论文
共 50 条
  • [41] Diagnostics in multivariate data analysis: Sensitivity analysis for principal components and canonical correlations
    Tanaka, Y
    Zhang, F
    Yang, W
    [J]. EXPLORATORY DATA ANALYSIS IN EMPIRICAL RESEARCH, PROCEEDINGS, 2003, : 170 - 179
  • [42] Longitudinal Principal Component Analysis With an Application to Marketing Data
    Kinson, Christopher
    Tang, Xiwei
    Zuo, Zhen
    Qu, Annie
    [J]. JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2020, 29 (02) : 335 - 350
  • [43] Bayesian functional joint models for multivariate longitudinal and time-to-event data
    Li, Kan
    Luo, Sheng
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2019, 129 : 14 - 29
  • [44] Functional principal components analysis on moving time windows of longitudinal data: dynamic prediction of times to event
    Yan, Fangrong
    Lin, Xiao
    Li, Ruosha
    Huang, Xuelin
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS, 2018, 67 (04) : 961 - 978
  • [45] A Bayesian model for sparse functional data
    Thompson, Wesley K.
    Rosen, Ori
    [J]. BIOMETRICS, 2008, 64 (01) : 54 - 63
  • [46] Bayesian Joint Modeling of Multivariate Longitudinal and Survival Data With an Application to Diabetes Study
    Huang, Yangxin
    Chen, Jiaqing
    Xu, Lan
    Tang, Nian-Sheng
    [J]. FRONTIERS IN BIG DATA, 2022, 5
  • [47] A SEMIPARAMETRIC BAYESIAN APPROACH TO MULTIVARIATE LONGITUDINAL DATA
    Ghosh, Pulak
    Hanson, Timothy
    [J]. AUSTRALIAN & NEW ZEALAND JOURNAL OF STATISTICS, 2010, 52 (03) : 275 - 288
  • [48] Bayesian consensus clustering for multivariate longitudinal data
    Lu, Zihang
    Lou, Wendy
    [J]. STATISTICS IN MEDICINE, 2022, 41 (01) : 108 - 127
  • [49] Functional principal component analysis for longitudinal data with informative dropout
    Shi, Haolun
    Dong, Jianghu
    Wang, Liangliang
    Cao, Jiguo
    [J]. STATISTICS IN MEDICINE, 2021, 40 (03) : 712 - 724
  • [50] Properties of principal component methods for functional and longitudinal data analysis
    Hall, Peter
    Mueller, Hans-Georg
    Wang, Jane-Ling
    [J]. ANNALS OF STATISTICS, 2006, 34 (03): : 1493 - 1517