Bayesian Model Selection via Composite Likelihood for High-dimensional Data Integration

被引:0
|
作者
Zhang, Guanlin [1 ]
Wu, Yuehua [1 ]
Gao, Xin [1 ]
机构
[1] York Univ, Toronto, ON, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Bayesian method; data integration; Gibbs sampling; model selection; sub-Gaussian; subexponential; union support recovery; VARIABLE SELECTION; CONSISTENCY; INFERENCE; PRIORS;
D O I
10.1002/cjs.11800
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We consider data integration problems where correlated data are collected from multiple platforms. Within each platform, there are linear relationships between the responses and a collection of predictors. We extend the linear models to include random errors coming from a much wider family of sub-Gaussian and subexponential distributions. The goal is to select important predictors across multiple platforms, where the number of predictors and the number of observations both increase to infinity. We combine the marginal densities of the responses obtained from different platforms to form a composite likelihood and propose a model selection criterion based on Bayesian composite posterior probabilities. Under some regularity conditions, we prove that the model selection criterion is consistent to recover the union support of the predictors with divergent true model size. Dans cette etude, la problematique de l'integration de donnees correlees collectees a partir de diverses plateformes est minutieusement examinee. Au sein de chaque plateforme, des relations lineaires sont identifiees entre les variables de reponse et un ensemble specifique de predicteurs. Pour enrichir l'analyse, les modeles lineaires sont generalises afin d'inclure des composantes d'erreur aleatoire issues d'une famille elargie de lois, telles que les distributions sous-gaussiennes et sous-exponentielles. L'objectif principal de l'etude est l'identification de predicteurs pertinents a travers plusieurs plateformes, une tache rendue plus complexe par l'augmentation indefinie du nombre de predicteurs et du volume d'observations. A cet effet, les auteurs de ce travail combinent les densites marginales des variables reponses provenant de differentes plateformes pour former une fonction de vraisemblance composite. Sur cette base, ils proposent un critere de selection de modele en s'appuyant sur des probabilites a posteriori composites dans un contexte bayesien. Enfin, sous des conditions de regularite specifiques, les auteurs demontrent que leur critere de selection de modele est convergent et permet de recuperer le support d'union des predicteurs, meme en presence d'une divergence dans la taille du modele veritable.
引用
下载
收藏
页码:924 / 938
页数:15
相关论文
共 50 条
  • [21] Proximal nested sampling for high-dimensional Bayesian model selection
    Cai, Xiaohao
    McEwen, Jason D.
    Pereyra, Marcelo
    STATISTICS AND COMPUTING, 2022, 32 (05)
  • [22] Computationally Efficient High-Dimensional Bayesian Optimization via Variable Selection
    Shen, Yihang
    Kingsford, Carl
    INTERNATIONAL CONFERENCE ON AUTOMATED MACHINE LEARNING, VOL 224, 2023, 224
  • [23] Linear Kernel Tests via Empirical Likelihood for High-Dimensional Data
    Ding, Lizhong
    Liu, Zhi
    Li, Yu
    Liao, Shizhong
    Liu, Yong
    Yang, Peng
    Yu, Ge
    Shao, Ling
    Gao, Xin
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 3454 - 3461
  • [24] Feature selection using autoencoders with Bayesian methods to high-dimensional data
    Shu, Lei
    Huang, Kun
    Jiang, Wenhao
    Wu, Wenming
    Liu, Hongling
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 41 (06) : 7397 - 7406
  • [25] Simultaneous Feature and Model Selection for High-Dimensional Data
    Perolini, Alessandro
    Guerif, Sebastien
    2011 23RD IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2011), 2011, : 47 - 50
  • [26] Penalised empirical likelihood for the additive hazards model with high-dimensional data
    Fang, Jianglin
    Liu, Wanrong
    Lu, Xuewen
    JOURNAL OF NONPARAMETRIC STATISTICS, 2017, 29 (02) : 326 - 345
  • [27] Calibration of the empirical likelihood for high-dimensional data
    Liu, Yukun
    Zou, Changliang
    Wang, Zhaojun
    ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 2013, 65 (03) : 529 - 550
  • [28] Calibration of the empirical likelihood for high-dimensional data
    Yukun Liu
    Changliang Zou
    Zhaojun Wang
    Annals of the Institute of Statistical Mathematics, 2013, 65 : 529 - 550
  • [29] Bayesian shrinkage models for integration and analysis of multiplatform high-dimensional genomics data
    Xue, Hao
    Chakraborty, Sounak
    Dey, Tanujit
    STATISTICAL ANALYSIS AND DATA MINING, 2024, 17 (02)
  • [30] Consistent High-Dimensional Bayesian Variable Selection via Penalized Credible Regions
    Bondell, Howard D.
    Reich, Brian J.
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2012, 107 (500) : 1610 - 1624