Elastic Consistency: A Practical Consistency Model for Distributed Stochastic Gradient Descent

被引:0
|
作者
Nadiradze, Giorgi [1 ]
Markov, Ilia [1 ]
Chatterjee, Bapi [1 ]
Kungurtsev, Vyacheslav [2 ]
Alistarh, Dan [1 ]
机构
[1] IST Austria, Klosterneuburg, Austria
[2] Czech Tech Univ, Prague, Czech Republic
基金
欧盟地平线“2020”; 欧洲研究理事会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
One key element behind the progress of machine learning in recent years has been the ability to train machine learning models in large-scale distributed shared-memory and message-passing environments. Most of these models are trained employing variants of stochastic gradient descent (SGD) based optimization. In this paper, we introduce a general consistency condition covering communication-reduced and asynchronous distributed SGD implementations. Our framework, called elastic consistency, decouples the system-specific aspects of the implementation from the SGD convergence requirements, giving a general way to obtain convergence bounds for a wide variety of distributed SGD methods used in practice. Elastic consistency can be used to re-derive or improve several previous convergence bounds in message-passing and shared-memory settings, but also to analyze new models and distribution schemes. In particular, we propose and analyze a new synchronization-avoiding scheme for distributed SGD, and show that it can be used to efficiently train deep convolutional models for image classification.
引用
收藏
页码:9037 / 9045
页数:9
相关论文
共 50 条
  • [1] Strong consistency of the distributed stochastic gradient algorithm
    Gan, Die
    Liu, Zhixin
    [J]. 2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 5082 - 5087
  • [2] Bayesian Distributed Stochastic Gradient Descent
    Teng, Michael
    Wood, Frank
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [3] Consistency and Fluctuations For Stochastic Gradient Langevin Dynamics
    Teh, Yee Whye
    Thiery, Alexandre H.
    Vollmer, Sebastian J.
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2016, 17
  • [5] Elastic Consistency in Decentralized Distributed Virtual Environments
    Schloss, Hermann
    Botev, Jean
    Hoehfeld, Alex
    Scholtes, Ingo
    Sturm, Peter
    Esch, Markus
    [J]. FOURTH INTERNATIONAL CONFERENCE ON AUTOMATED SOLUTIONS FOR CROSS MEDIA CONTENT AND MULTI-CHANNEL DISTRIBUTION, PROCEEDINGS, 2008, : 249 - +
  • [6] Practical Efficiency of Asynchronous Stochastic Gradient Descent
    Bhardwaj, Onkar
    Cong, Guojing
    [J]. PROCEEDINGS OF 2016 2ND WORKSHOP ON MACHINE LEARNING IN HPC ENVIRONMENTS (MLHPC), 2016, : 56 - 62
  • [7] A DAG Model of Synchronous Stochastic Gradient Descent in Distributed Deep Learning
    Shi, Shaohuai
    Wang, Qiang
    Chu, Xiaowen
    Li, Bo
    [J]. 2018 IEEE 24TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS 2018), 2018, : 425 - 432
  • [8] Predicting Throughput of Distributed Stochastic Gradient Descent
    Li, Zhuojin
    Paolieri, Marco
    Golubchik, Leana
    Lin, Sung-Han
    Yan, Wumo
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (11) : 2900 - 2912
  • [9] Distributed stochastic gradient descent with discriminative aggregating
    Chen, Zhen-Hong
    Lan, Yan-Yan
    Guo, Jia-Feng
    Cheng, Xue-Qi
    [J]. Jisuanji Xuebao/Chinese Journal of Computers, 2015, 38 (10): : 2054 - 2063
  • [10] STRONG CONSISTENCY OF A CLASS OF RECURSIVE STOCHASTIC GRADIENT ALGORITHMS
    HERSH, MA
    ZARROP, MB
    [J]. INTERNATIONAL JOURNAL OF CONTROL, 1986, 43 (04) : 1115 - 1123