Bayesian Model Selection via Composite Likelihood for High-dimensional Data Integration

被引:0
|
作者
Zhang, Guanlin [1 ]
Wu, Yuehua [1 ]
Gao, Xin [1 ]
机构
[1] York Univ, Toronto, ON, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Bayesian method; data integration; Gibbs sampling; model selection; sub-Gaussian; subexponential; union support recovery; VARIABLE SELECTION; CONSISTENCY; INFERENCE; PRIORS;
D O I
10.1002/cjs.11800
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We consider data integration problems where correlated data are collected from multiple platforms. Within each platform, there are linear relationships between the responses and a collection of predictors. We extend the linear models to include random errors coming from a much wider family of sub-Gaussian and subexponential distributions. The goal is to select important predictors across multiple platforms, where the number of predictors and the number of observations both increase to infinity. We combine the marginal densities of the responses obtained from different platforms to form a composite likelihood and propose a model selection criterion based on Bayesian composite posterior probabilities. Under some regularity conditions, we prove that the model selection criterion is consistent to recover the union support of the predictors with divergent true model size. Dans cette etude, la problematique de l'integration de donnees correlees collectees a partir de diverses plateformes est minutieusement examinee. Au sein de chaque plateforme, des relations lineaires sont identifiees entre les variables de reponse et un ensemble specifique de predicteurs. Pour enrichir l'analyse, les modeles lineaires sont generalises afin d'inclure des composantes d'erreur aleatoire issues d'une famille elargie de lois, telles que les distributions sous-gaussiennes et sous-exponentielles. L'objectif principal de l'etude est l'identification de predicteurs pertinents a travers plusieurs plateformes, une tache rendue plus complexe par l'augmentation indefinie du nombre de predicteurs et du volume d'observations. A cet effet, les auteurs de ce travail combinent les densites marginales des variables reponses provenant de differentes plateformes pour former une fonction de vraisemblance composite. Sur cette base, ils proposent un critere de selection de modele en s'appuyant sur des probabilites a posteriori composites dans un contexte bayesien. Enfin, sous des conditions de regularite specifiques, les auteurs demontrent que leur critere de selection de modele est convergent et permet de recuperer le support d'union des predicteurs, meme en presence d'une divergence dans la taille du modele veritable.
引用
收藏
页码:924 / 938
页数:15
相关论文
共 50 条
  • [1] Composite Likelihood Bayesian Information Criteria for Model Selection in High-Dimensional Data
    Gao, Xin
    Song, Peter X. -K.
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2010, 105 (492) : 1531 - 1540
  • [2] Bayesian Model Selection in High-Dimensional Settings
    Johnson, Valen E.
    Rossell, David
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2012, 107 (498) : 649 - 660
  • [3] A COMPOSITE LIKELIHOOD APPROACH TO COMPUTER MODEL CALIBRATION WITH HIGH-DIMENSIONAL SPATIAL DATA
    Chang, Won
    Haran, Murali
    Olson, Roman
    Keller, Klaus
    [J]. STATISTICA SINICA, 2015, 25 (01) : 243 - 259
  • [4] Model Selection for High-Dimensional Data
    Owrang, Arash
    Jansson, Magnus
    [J]. 2016 50TH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, 2016, : 606 - 609
  • [5] Bayesian variable selection in clustering high-dimensional data
    Tadesse, MG
    Sha, N
    Vannucci, M
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2005, 100 (470) : 602 - 617
  • [6] Bayesian variable selection for high-dimensional rank data
    Cui, Can
    Singh, Susheela P.
    Staicu, Ana-Maria
    Reich, Brian J.
    [J]. ENVIRONMETRICS, 2021, 32 (07)
  • [7] Bayesian model selection for high-dimensional Ising models, with applications to educational data
    Park, Jaewoo
    Jin, Ick Hoon
    Schweinberger, Michael
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2022, 165
  • [8] Bayesian variable selection in multinomial probit model for classifying high-dimensional data
    Aijun Yang
    Yunxian Li
    Niansheng Tang
    Jinguan Lin
    [J]. Computational Statistics, 2015, 30 : 399 - 418
  • [9] Bayesian variable selection in multinomial probit model for classifying high-dimensional data
    Yang, Aijun
    Li, Yunxian
    Tang, Niansheng
    Lin, Jinguan
    [J]. COMPUTATIONAL STATISTICS, 2015, 30 (02) : 399 - 418
  • [10] Bayesian variable selection in clustering high-dimensional data via a mixture of finite mixtures
    Doo, Woojin
    Kim, Heeyoung
    [J]. JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2021, 91 (12) : 2551 - 2568