Communication-Efficient Algorithms for Statistical Optimization

被引：0

作者：

Zhang, Yuchen ^{[1
]}

Duchi, John C. ^{[1
]}

Wainwright, Martin J. ^{[2
]}

机构：

[1] Univ Calif Berkeley, Dept Elect Engn & Comp Sci, Berkeley, CA 94720 USA

[2] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94720 USA

来源：

JOURNAL OF MACHINE LEARNING RESEARCH | 2013年 / 14卷

关键词：

distributed learning; stochastic optimization; averaging; subsampling; STOCHASTIC-APPROXIMATION;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We analyze two communication-efficient algorithms for distributed optimization in statistical settings involving large-scale data sets. The first algorithm is a standard averaging method that distributes the N data samples evenly to m machines, performs separate minimization on each subset, and then averages the estimates. We provide a sharp analysis of this average mixture algorithm, showing that under a reasonable set of conditions, the combined parameter achieves mean-squared error (MSE) that decays as O(N-1 + (N/m)(-2)). Whenever m <= root N, this guarantee matches the best possible rate achievable by a centralized algorithm having access to all N samples. The second algorithm is a novel method, based on an appropriate form of bootstrap subsampling. Requiring only a single round of communication, it has mean-squared error that decays as O(N-1 + (N/m)(-3)), and so is more robust to the amount of parallelization. In addition, we show that a stochastic gradient-based method attains mean-squared error decaying as O(N-1 + (N/m)(-3/2)), easing computation at the expense of a potentially slower MSE rate. We also provide an experimental evaluation of our methods, investigating their performance both on simulated data and on a large-scale regression problem from the internet search domain. In particular, we show that our methods can be used to efficiently solve an advertisement prediction problem from the Chinese SoSo Search Engine, which involves logistic regression with N approximate to 2.4 x 10(8) samples and d approximate to 740,000 covariates.

引用

页码：3321 / 3363

页数：43

共 50 条

[1] Communication-Efficient Algorithms for Statistical Optimization
Zhang, Yuchen
Duchi, John C.
Wainwright, Martin J.
[J]. 2012 IEEE 51ST ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2012, : 6792 - 6792
[2] Communication-efficient algorithms for decentralized and stochastic optimization
Lan, Guanghui
Lee, Soomin
Zhou, Yi
[J]. MATHEMATICAL PROGRAMMING, 2020, 180 (1-2) : 237 - 284
[3] Communication-efficient algorithms for decentralized and stochastic optimization
Guanghui Lan
Soomin Lee
Yi Zhou
[J]. Mathematical Programming, 2020, 180 : 237 - 284
[4] Differentially Private and Communication-Efficient Distributed Nonconvex Optimization Algorithms
Xie, Antai
Yi, Xinlei
Wang, Xiaofan
Cao, Ming
Ren, Xiaoqiang
[J]. arXiv, 2023,
[5] Communication-Efficient Distributed Statistical Inference
Jordan, Michael I.
Lee, Jason D.
Yang, Yun
[J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2019, 114 (526) : 668 - 681
[6] Communication-Efficient Accurate Statistical Estimation
Fan, Jianqing
Guo, Yongyi
Wang, Kaizheng
[J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2023, 118 (542) : 1000 - 1010
[7] Private and Communication-Efficient Algorithms for Entropy Estimation
Bravo-Hermsdorff, Gecia
Busa-Fekete, Robert
Ghavamzadeh, Mohammad
Medina, Andres Munoz
Syed, Umar
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[8] Communication-Efficient Edge AI: Algorithms and Systems
Shi, Yuanming
Yang, Kai
Jiang, Tao
Zhang, Jun
Letaief, Khaled B.
[J]. IEEE COMMUNICATIONS SURVEYS AND TUTORIALS, 2020, 22 (04): : 2167 - 2191
[9] FedBoost: Communication-Efficient Algorithms for Federated Learning
Hamer, Jenny
Mohri, Mehryar
Suresh, Ananda Theertha
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
[10] Communication-Efficient Algorithms for Numerical Quantum Dynamics
Gustafsson, Magnus
Kormann, Katharina
Holmgren, Sverker
[J]. APPLIED PARALLEL AND SCIENTIFIC COMPUTING, PT II, 2012, 7134 : 368 - 378

← 1 2 3 4 5 →