Distributed Bayesian posterior voting strategy for massive data

被引:1
|
作者
Li, Xuerui [1 ]
Kang, Lican [2 ]
Liu, Yanyan [1 ]
Wu, Yuanshan [3 ]
机构
[1] Wuhan Univ, Sch Math & Stat, Wuhan, Peoples R China
[2] NUS Med Sch, Ctr Quantitat Med Duke, Singapore, Singapore
[3] Zhongnan Univ Econ, Sch Stat & Math, Wuhan, Peoples R China
来源
ELECTRONIC RESEARCH ARCHIVE | 2022年 / 30卷 / 05期
关键词
Hierarchical Bayes formulation; massive data; majority-voting; split-and-conquer; Shrinkage prior; VARIABLE SELECTION; REGRESSION;
D O I
10.3934/era.2022098
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
The emergence of massive data has driven recent interest in developing statistical learning and large-scale algorithms for analysis on distributed platforms. One of the widely used statistical approaches is split-and-conquer (SaC), which was originally performed by aggregating all local solutions through a simple average to reduce the computational burden caused by communication costs. Aiming at lower computation cost and satisfactorily acceptable accuracy, this paper extends SaC to Bayesian variable selection for ultra-high dimensional linear regression and builds BVSaC for aggregation. Suppose ultrahigh-dimensional data are stored in a distributed manner across multiple computing nodes, with each computing resource containing a disjoint subset of data. On each node machine, we perform variable selection and coefficient estimation through a hierarchical Bayes formulation. Then, a weighted majority voting method BVSaC is used to combine the local results to retain good performance. The proposed approach only requires a small portion of computation cost on each local dataset and therefore eases the computational burden, especially in Bayesian computation, meanwhile, pays a little cost to receive accuracy, which in turn increases the feasibility of analyzing extraordinarily large datasets. Simulations and a real-world example show that the proposed approach performed as well as the whole sample hierarchical Bayes method in terms of the accuracy of variable selection and estimation.
引用
收藏
页码:1936 / 1953
页数:18
相关论文
共 50 条
  • [41] Distributed optimal subsampling for quantile regression with massive data
    Chao, Yue
    Ma, Xuejun
    Zhu, Boya
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2024, 233
  • [42] Estimating the Frequency of Data Items in Massive Distributed Streams
    Anceaume, Emmanuelle
    Busnel, Yann
    Rivetti, Nicolo
    2015 IEEE 4TH SYMPOSIUM ON NETWORK CLOUD COMPUTING AND APPLICATIONS - NCCA 2015, 2015, : 59 - 66
  • [43] Distributed optimization for penalized regression in massive compositional data
    Chao, Yue
    Huang, Lei
    Ma, Xuejun
    APPLIED MATHEMATICAL MODELLING, 2025, 141
  • [44] Data detection in decentralized and distributed massive MIMO networks
    Albreem, Mahmoud A.
    Alhabbash, Alaa
    Abu-Hudrouss, Ammar M.
    Almohamad, Tarik Adnan
    COMPUTER COMMUNICATIONS, 2022, 189 : 79 - 99
  • [45] The distortion of distributed voting
    Filos-Ratsikas, Aris
    Micha, Evi
    Voudouris, Alexandros A.
    ARTIFICIAL INTELLIGENCE, 2020, 286
  • [46] AUTO-SCALED BAYESIAN BROWSING MODEL IN MASSIVE DATA
    Ru, Liyun
    Wang, Anhui
    Wu, Yingying
    Ma, Shaoping
    2012 IEEE 2ND INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENT SYSTEMS (CCIS) VOLS 1-3, 2012, : 29 - 33
  • [47] Distributed Bayesian Learning with Stochastic Natural Gradient Expectation Propagation and the Posterior Server
    Hasenclever, Leonard
    Webb, Stefan
    Lienart, Thibaut
    Vollmer, Sebastian
    Lakshminarayanan, Balaji
    Blundell, Charles
    Teh, Yee Whye
    JOURNAL OF MACHINE LEARNING RESEARCH, 2017, 18
  • [48] The Distortion of Distributed Voting
    Filos-Ratsikas, Aris
    Micha, Evi
    Voudouris, Alexandros A.
    ALGORITHMIC GAME THEORY (SAGT 2019), 2019, 11801 : 312 - 325
  • [49] Asymptotic Analysis of Distributed Bayesian Detection with Byzantine Data
    Kailkhura, Bhavya
    Han, Yunghsiang S.
    Brahma, Swastik
    Varshney, Pramod K.
    IEEE SIGNAL PROCESSING LETTERS, 2015, 22 (05) : 608 - 612
  • [50] Sparse Bayesian similarity learning based on posterior distribution of data
    Zabihzadeh, Davood
    Monsefi, Reza
    Yazdi, Hadi Sadoghi
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2018, 67 : 173 - 186