Distributed Bayesian posterior voting strategy for massive data

被引：1

作者：

Li, Xuerui ^{[1
]}

Kang, Lican ^{[2
]}

Liu, Yanyan ^{[1
]}

Wu, Yuanshan ^{[3
]}

机构：

[1] Wuhan Univ, Sch Math & Stat, Wuhan, Peoples R China

[2] NUS Med Sch, Ctr Quantitat Med Duke, Singapore, Singapore

[3] Zhongnan Univ Econ, Sch Stat & Math, Wuhan, Peoples R China

来源：

ELECTRONIC RESEARCH ARCHIVE | 2022年 / 30卷 / 05期

关键词：

Hierarchical Bayes formulation; massive data; majority-voting; split-and-conquer; Shrinkage prior; VARIABLE SELECTION; REGRESSION;

D O I：

10.3934/era.2022098

中图分类号：

O1 [数学];

学科分类号：

0701 ; 070101 ;

摘要：

The emergence of massive data has driven recent interest in developing statistical learning and large-scale algorithms for analysis on distributed platforms. One of the widely used statistical approaches is split-and-conquer (SaC), which was originally performed by aggregating all local solutions through a simple average to reduce the computational burden caused by communication costs. Aiming at lower computation cost and satisfactorily acceptable accuracy, this paper extends SaC to Bayesian variable selection for ultra-high dimensional linear regression and builds BVSaC for aggregation. Suppose ultrahigh-dimensional data are stored in a distributed manner across multiple computing nodes, with each computing resource containing a disjoint subset of data. On each node machine, we perform variable selection and coefficient estimation through a hierarchical Bayes formulation. Then, a weighted majority voting method BVSaC is used to combine the local results to retain good performance. The proposed approach only requires a small portion of computation cost on each local dataset and therefore eases the computational burden, especially in Bayesian computation, meanwhile, pays a little cost to receive accuracy, which in turn increases the feasibility of analyzing extraordinarily large datasets. Simulations and a real-world example show that the proposed approach performed as well as the whole sample hierarchical Bayes method in terms of the accuracy of variable selection and estimation.

引用

页码：1936 / 1953

页数：18

共 50 条

[41] Distributed optimal subsampling for quantile regression with massive data
Chao, Yue
Ma, Xuejun
Zhu, Boya
JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2024, 233
[42] Estimating the Frequency of Data Items in Massive Distributed Streams
Anceaume, Emmanuelle
Busnel, Yann
Rivetti, Nicolo
2015 IEEE 4TH SYMPOSIUM ON NETWORK CLOUD COMPUTING AND APPLICATIONS - NCCA 2015, 2015, : 59 - 66
[43] Distributed optimization for penalized regression in massive compositional data
Chao, Yue
Huang, Lei
Ma, Xuejun
APPLIED MATHEMATICAL MODELLING, 2025, 141
[44] Data detection in decentralized and distributed massive MIMO networks
Albreem, Mahmoud A.
Alhabbash, Alaa
Abu-Hudrouss, Ammar M.
Almohamad, Tarik Adnan
COMPUTER COMMUNICATIONS, 2022, 189 : 79 - 99
[45] The distortion of distributed voting
Filos-Ratsikas, Aris
Micha, Evi
Voudouris, Alexandros A.
ARTIFICIAL INTELLIGENCE, 2020, 286
[46] AUTO-SCALED BAYESIAN BROWSING MODEL IN MASSIVE DATA
Ru, Liyun
Wang, Anhui
Wu, Yingying
Ma, Shaoping
2012 IEEE 2ND INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENT SYSTEMS (CCIS) VOLS 1-3, 2012, : 29 - 33
[47] Distributed Bayesian Learning with Stochastic Natural Gradient Expectation Propagation and the Posterior Server
Hasenclever, Leonard
Webb, Stefan
Lienart, Thibaut
Vollmer, Sebastian
Lakshminarayanan, Balaji
Blundell, Charles
Teh, Yee Whye
JOURNAL OF MACHINE LEARNING RESEARCH, 2017, 18
[48] The Distortion of Distributed Voting
Filos-Ratsikas, Aris
Micha, Evi
Voudouris, Alexandros A.
ALGORITHMIC GAME THEORY (SAGT 2019), 2019, 11801 : 312 - 325
[49] Asymptotic Analysis of Distributed Bayesian Detection with Byzantine Data
Kailkhura, Bhavya
Han, Yunghsiang S.
Brahma, Swastik
Varshney, Pramod K.
IEEE SIGNAL PROCESSING LETTERS, 2015, 22 (05) : 608 - 612
[50] Sparse Bayesian similarity learning based on posterior distribution of data
Zabihzadeh, Davood
Monsefi, Reza
Yazdi, Hadi Sadoghi
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2018, 67 : 173 - 186

← 1 2 3 4 5 →