Distributed Bayesian posterior voting strategy for massive data

被引:1
|
作者
Li, Xuerui [1 ]
Kang, Lican [2 ]
Liu, Yanyan [1 ]
Wu, Yuanshan [3 ]
机构
[1] Wuhan Univ, Sch Math & Stat, Wuhan, Peoples R China
[2] NUS Med Sch, Ctr Quantitat Med Duke, Singapore, Singapore
[3] Zhongnan Univ Econ, Sch Stat & Math, Wuhan, Peoples R China
来源
ELECTRONIC RESEARCH ARCHIVE | 2022年 / 30卷 / 05期
关键词
Hierarchical Bayes formulation; massive data; majority-voting; split-and-conquer; Shrinkage prior; VARIABLE SELECTION; REGRESSION;
D O I
10.3934/era.2022098
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
The emergence of massive data has driven recent interest in developing statistical learning and large-scale algorithms for analysis on distributed platforms. One of the widely used statistical approaches is split-and-conquer (SaC), which was originally performed by aggregating all local solutions through a simple average to reduce the computational burden caused by communication costs. Aiming at lower computation cost and satisfactorily acceptable accuracy, this paper extends SaC to Bayesian variable selection for ultra-high dimensional linear regression and builds BVSaC for aggregation. Suppose ultrahigh-dimensional data are stored in a distributed manner across multiple computing nodes, with each computing resource containing a disjoint subset of data. On each node machine, we perform variable selection and coefficient estimation through a hierarchical Bayes formulation. Then, a weighted majority voting method BVSaC is used to combine the local results to retain good performance. The proposed approach only requires a small portion of computation cost on each local dataset and therefore eases the computational burden, especially in Bayesian computation, meanwhile, pays a little cost to receive accuracy, which in turn increases the feasibility of analyzing extraordinarily large datasets. Simulations and a real-world example show that the proposed approach performed as well as the whole sample hierarchical Bayes method in terms of the accuracy of variable selection and estimation.
引用
收藏
页码:1936 / 1953
页数:18
相关论文
共 50 条
  • [31] A Voting-Based Distributed Cooperative Spectrum Sensing Strategy for Connected Vehicles
    Aygun, Bengi
    Wyglinski, Alexander M.
    IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2017, 66 (06) : 5109 - 5121
  • [32] THE DISTRIBUTED VOTING STRATEGY FOR FAULT-DIAGNOSIS AND RECONFIGURATION OF LINEAR PROCESSOR ARRAYS
    RODA, VO
    LIN, TT
    MICROELECTRONICS AND RELIABILITY, 1994, 34 (06): : 955 - 967
  • [33] SUNFLOWER STRATEGY FOR BAYESIAN RELATIONAL DATA ANALYSIS
    Nakano, Masahiro
    Shibue, Ryohei
    Kashino, Kunio
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 6650 - 6654
  • [34] Data Hiding Methods Using Voting Strategy and Mapping Table
    Chi, Hengxiao
    Chang, Chin-Chen
    Lin, Chia-Chen
    JOURNAL OF INTERNET TECHNOLOGY, 2024, 25 (03): : 365 - 377
  • [35] Distributed Submodular Cover: Succinctly Summarizing Massive Data
    Mirzasoleiman, Baharan
    Karbasi, Amin
    Badanidiyuru, Ashwinkumar
    Krause, Andreas
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
  • [36] Bag of little bootstraps for massive and distributed longitudinal data
    Zhou, Xinkai
    Zhou, Jin J.
    Zhou, Hua
    STATISTICAL ANALYSIS AND DATA MINING, 2022, 15 (03) : 314 - 321
  • [37] DISTRIBUTED SUFFICIENT DIMENSION REDUCTION FOR HETEROGENEOUS MASSIVE DATA
    Xu, Kelin
    Zhu, Liping
    Fan, Jianqing
    STATISTICA SINICA, 2022, 32 : 2455 - 2476
  • [38] Distributed testing on mutual independence of massive multivariate data
    Kuang, Yongxin
    Xie, Junshan
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2023, 52 (15) : 5332 - 5348
  • [39] A distributed rendering environment for massive data on computational grids
    Zhu, HB
    Wang, LZ
    Yun, CK
    Cai, WT
    See, S
    THIRD INTERNATIONAL CONFERENCE ON PEER-TO-PEER COMPUTING (P2P2003), PROCEEDINGS, 2003, : 176 - 183
  • [40] Adaptive distributed support vector regression of massive data
    Liang, Shu-na
    Sun, Fei
    Zhang, Qi
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2024, 53 (09) : 3365 - 3382