Bayesian Distributed Stochastic Gradient Descent

被引:0
|
作者
Teng, Michael [1 ]
Wood, Frank [2 ]
机构
[1] Univ Oxford, Dept Engn Sci, Oxford, England
[2] Univ British Columbia, Dept Comp Sci, Vancouver, BC, Canada
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce Bayesian distributed stochastic gradient descent (BDSGD), a high-throughput algorithm for training deep neural networks on parallel computing clusters. This algorithm uses amortized inference in a deep generative model to perform joint posterior predictive inference of mini-batch gradient computation times in a compute cluster specific manner. Specifically, our algorithm mitigates the straggler effect in synchronous, gradient-based optimization by choosing an optimal cutoff beyond which mini-batch gradient messages from slow workers are ignored. The principle novel contribution and finding of this work goes beyond this by demonstrating that using the predicted run-times from a generative model of cluster worker performance improves over the static-cutoff prior art, leading to higher gradient computation throughput on large compute clusters. In our experiments we show that eagerly discarding the mini-batch gradient computations of stragglers not only increases throughput but sometimes also increases the overall rate of convergence as a function of wall-clock time by virtue of eliminating idleness.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Stochastic Gradient Descent as Approximate Bayesian Inference
    Mandt, Stephan
    Hoffman, Matthew D.
    Blei, David M.
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2017, 18
  • [2] Stochastic gradient descent as approximate Bayesian inference
    [J]. 1600, Microtome Publishing (18):
  • [3] Predicting Throughput of Distributed Stochastic Gradient Descent
    Li, Zhuojin
    Paolieri, Marco
    Golubchik, Leana
    Lin, Sung-Han
    Yan, Wumo
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (11) : 2900 - 2912
  • [4] Distributed stochastic gradient descent with discriminative aggregating
    Chen, Zhen-Hong
    Lan, Yan-Yan
    Guo, Jia-Feng
    Cheng, Xue-Qi
    [J]. Jisuanji Xuebao/Chinese Journal of Computers, 2015, 38 (10): : 2054 - 2063
  • [5] Automatic Tuning of Stochastic Gradient Descent with Bayesian Optimisation
    Picheny, Victor
    Dutordoir, Vincent
    Artemev, Artem
    Durrande, Nicolas
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2020, PT III, 2021, 12459 : 431 - 446
  • [6] BAYESIAN STOCHASTIC GRADIENT DESCENT FOR STOCHASTIC OPTIMIZATION WITH STREAMING INPUT DATA
    Liu, Tianyi
    Lin, Yifan
    Zhou, Enlu
    [J]. SIAM JOURNAL ON OPTIMIZATION, 2024, 34 (01) : 389 - 418
  • [7] Convergence analysis of distributed stochastic gradient descent with shuffling
    Meng, Qi
    Chen, Wei
    Wang, Yue
    Ma, Zhi-Ming
    Liu, Tie-Yan
    [J]. NEUROCOMPUTING, 2019, 337 : 46 - 57
  • [8] Distributed Stochastic Gradient Descent Using LDGM Codes
    Horii, Shunsuke
    Yoshida, Takahiro
    Kobayashi, Manabu
    Matsushima, Toshiyasu
    [J]. 2019 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2019, : 1417 - 1421
  • [9] Distributed Stochastic Gradient Descent With Compressed and Skipped Communication
    Phuong, Tran Thi
    Phong, Le Trieu
    Fukushima, Kazuhide
    [J]. IEEE ACCESS, 2023, 11 : 99836 - 99846
  • [10] Distributed and asynchronous Stochastic Gradient Descent with variance reduction
    Ming, Yuewei
    Zhao, Yawei
    Wu, Chengkun
    Li, Kuan
    Yin, Jianping
    [J]. NEUROCOMPUTING, 2018, 281 : 27 - 36