Bayesian Model Selection via Composite Likelihood for High-dimensional Data Integration

被引:0
|
作者
Zhang, Guanlin [1 ]
Wu, Yuehua [1 ]
Gao, Xin [1 ]
机构
[1] York Univ, Toronto, ON, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Bayesian method; data integration; Gibbs sampling; model selection; sub-Gaussian; subexponential; union support recovery; VARIABLE SELECTION; CONSISTENCY; INFERENCE; PRIORS;
D O I
10.1002/cjs.11800
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We consider data integration problems where correlated data are collected from multiple platforms. Within each platform, there are linear relationships between the responses and a collection of predictors. We extend the linear models to include random errors coming from a much wider family of sub-Gaussian and subexponential distributions. The goal is to select important predictors across multiple platforms, where the number of predictors and the number of observations both increase to infinity. We combine the marginal densities of the responses obtained from different platforms to form a composite likelihood and propose a model selection criterion based on Bayesian composite posterior probabilities. Under some regularity conditions, we prove that the model selection criterion is consistent to recover the union support of the predictors with divergent true model size. Dans cette etude, la problematique de l'integration de donnees correlees collectees a partir de diverses plateformes est minutieusement examinee. Au sein de chaque plateforme, des relations lineaires sont identifiees entre les variables de reponse et un ensemble specifique de predicteurs. Pour enrichir l'analyse, les modeles lineaires sont generalises afin d'inclure des composantes d'erreur aleatoire issues d'une famille elargie de lois, telles que les distributions sous-gaussiennes et sous-exponentielles. L'objectif principal de l'etude est l'identification de predicteurs pertinents a travers plusieurs plateformes, une tache rendue plus complexe par l'augmentation indefinie du nombre de predicteurs et du volume d'observations. A cet effet, les auteurs de ce travail combinent les densites marginales des variables reponses provenant de differentes plateformes pour former une fonction de vraisemblance composite. Sur cette base, ils proposent un critere de selection de modele en s'appuyant sur des probabilites a posteriori composites dans un contexte bayesien. Enfin, sous des conditions de regularite specifiques, les auteurs demontrent que leur critere de selection de modele est convergent et permet de recuperer le support d'union des predicteurs, meme en presence d'une divergence dans la taille du modele veritable.
引用
下载
收藏
页码:924 / 938
页数:15
相关论文
共 50 条
  • [41] Bayesian variable selection and model averaging in high-dimensional multinomial nonparametric regression
    Yau, P
    Kohn, R
    Wood, S
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2003, 12 (01) : 23 - 54
  • [42] Variable selection via combined penalization for high-dimensional data analysis
    Wang, Xiaoming
    Park, Taesung
    Carriere, K. C.
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2010, 54 (10) : 2230 - 2243
  • [43] ON THE COMPUTATIONAL COMPLEXITY OF HIGH-DIMENSIONAL BAYESIAN VARIABLE SELECTION
    Yang, Yun
    Wainwright, Martin J.
    Jordan, Michael I.
    ANNALS OF STATISTICS, 2016, 44 (06): : 2497 - 2532
  • [44] Bayesian Optimization for Policy Search in High-Dimensional Systems via Automatic Domain Selection
    Frohlich, Lukas P.
    Klenske, Edgar D.
    Daniel, Christian G.
    Zeilinger, Melanie N.
    2019 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2019, : 757 - 764
  • [45] Smoothed empirical likelihood estimation and automatic variable selection for an expectile high-dimensional model
    Ciuperca, Gabriela
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2024,
  • [46] A variable-selection control chart via penalized likelihood and Gaussian mixture model for multimodal and high-dimensional processes
    Yan, Dandan
    Zhang, Shuai
    Jung, Uk
    QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL, 2019, 35 (04) : 1263 - 1275
  • [47] Multimodal data integration via mediation analysis with high-dimensional exposures and mediators
    Zhao, Yi
    Li, Lexin
    HUMAN BRAIN MAPPING, 2022, 43 (08) : 2519 - 2533
  • [48] Elucidating the Complexity of Psychiatric Disorders via the Integration of High-dimensional, Multiscale Data
    Schadt, Eric E.
    NEUROPSYCHOPHARMACOLOGY, 2013, 38 : S69 - S70
  • [49] Model selection and application to high-dimensional count data clustering: via finite EDCM mixture models
    Zamzami, Nuha
    Bouguila, Nizar
    APPLIED INTELLIGENCE, 2019, 49 (04) : 1467 - 1488
  • [50] A General Framework for High-Dimensional Data Reduction Using Unsupervised Bayesian Model
    Jin, Longcun
    Wan, Wanggen
    Wu, Yongliang
    Cui, Bin
    Yu, Xiaoqing
    LIFE SYSTEM MODELING AND INTELLIGENT COMPUTING, PT II, 2010, 98 : 96 - 101