Bayesian Distributed Stochastic Gradient Descent

被引：0

作者：

Teng, Michael ^{[1
]}

Wood, Frank ^{[2
]}

机构：

[1] Univ Oxford, Dept Engn Sci, Oxford, England

[2] Univ British Columbia, Dept Comp Sci, Vancouver, BC, Canada

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018) | 2018年 / 31卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We introduce Bayesian distributed stochastic gradient descent (BDSGD), a high-throughput algorithm for training deep neural networks on parallel computing clusters. This algorithm uses amortized inference in a deep generative model to perform joint posterior predictive inference of mini-batch gradient computation times in a compute cluster specific manner. Specifically, our algorithm mitigates the straggler effect in synchronous, gradient-based optimization by choosing an optimal cutoff beyond which mini-batch gradient messages from slow workers are ignored. The principle novel contribution and finding of this work goes beyond this by demonstrating that using the predicted run-times from a generative model of cluster worker performance improves over the static-cutoff prior art, leading to higher gradient computation throughput on large compute clusters. In our experiments we show that eagerly discarding the mini-batch gradient computations of stragglers not only increases throughput but sometimes also increases the overall rate of convergence as a function of wall-clock time by virtue of eliminating idleness.

引用

页数：11

共 50 条

[1] Stochastic Gradient Descent as Approximate Bayesian Inference
Mandt, Stephan
Hoffman, Matthew D.
Blei, David M.
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2017, 18
[2] Stochastic gradient descent as approximate Bayesian inference
[J]. 1600, Microtome Publishing (18):
[3] Predicting Throughput of Distributed Stochastic Gradient Descent
Li, Zhuojin
Paolieri, Marco
Golubchik, Leana
Lin, Sung-Han
Yan, Wumo
[J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (11) : 2900 - 2912
[4] Distributed stochastic gradient descent with discriminative aggregating
Chen, Zhen-Hong
Lan, Yan-Yan
Guo, Jia-Feng
Cheng, Xue-Qi
[J]. Jisuanji Xuebao/Chinese Journal of Computers, 2015, 38 (10): : 2054 - 2063
[5] Automatic Tuning of Stochastic Gradient Descent with Bayesian Optimisation
Picheny, Victor
Dutordoir, Vincent
Artemev, Artem
Durrande, Nicolas
[J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2020, PT III, 2021, 12459 : 431 - 446
[6] BAYESIAN STOCHASTIC GRADIENT DESCENT FOR STOCHASTIC OPTIMIZATION WITH STREAMING INPUT DATA
Liu, Tianyi
Lin, Yifan
Zhou, Enlu
[J]. SIAM JOURNAL ON OPTIMIZATION, 2024, 34 (01) : 389 - 418
[7] Convergence analysis of distributed stochastic gradient descent with shuffling
Meng, Qi
Chen, Wei
Wang, Yue
Ma, Zhi-Ming
Liu, Tie-Yan
[J]. NEUROCOMPUTING, 2019, 337 : 46 - 57
[8] Distributed Stochastic Gradient Descent Using LDGM Codes
Horii, Shunsuke
Yoshida, Takahiro
Kobayashi, Manabu
Matsushima, Toshiyasu
[J]. 2019 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2019, : 1417 - 1421
[9] Distributed Stochastic Gradient Descent With Compressed and Skipped Communication
Phuong, Tran Thi
Phong, Le Trieu
Fukushima, Kazuhide
[J]. IEEE ACCESS, 2023, 11 : 99836 - 99846
[10] Distributed and asynchronous Stochastic Gradient Descent with variance reduction
Ming, Yuewei
Zhao, Yawei
Wu, Chengkun
Li, Kuan
Yin, Jianping
[J]. NEUROCOMPUTING, 2018, 281 : 27 - 36

← 1 2 3 4 5 →