Elastic Consistency: A Practical Consistency Model for Distributed Stochastic Gradient Descent

被引:0
|
作者
Nadiradze, Giorgi [1 ]
Markov, Ilia [1 ]
Chatterjee, Bapi [1 ]
Kungurtsev, Vyacheslav [2 ]
Alistarh, Dan [1 ]
机构
[1] IST Austria, Klosterneuburg, Austria
[2] Czech Tech Univ, Prague, Czech Republic
基金
欧盟地平线“2020”; 欧洲研究理事会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
One key element behind the progress of machine learning in recent years has been the ability to train machine learning models in large-scale distributed shared-memory and message-passing environments. Most of these models are trained employing variants of stochastic gradient descent (SGD) based optimization. In this paper, we introduce a general consistency condition covering communication-reduced and asynchronous distributed SGD implementations. Our framework, called elastic consistency, decouples the system-specific aspects of the implementation from the SGD convergence requirements, giving a general way to obtain convergence bounds for a wide variety of distributed SGD methods used in practice. Elastic consistency can be used to re-derive or improve several previous convergence bounds in message-passing and shared-memory settings, but also to analyze new models and distribution schemes. In particular, we propose and analyze a new synchronization-avoiding scheme for distributed SGD, and show that it can be used to efficiently train deep convolutional models for image classification.
引用
收藏
页码:9037 / 9045
页数:9
相关论文
共 50 条
  • [21] THE STRONG CONSISTENCY OF THE STOCHASTIC GRADIENT ALGORITHM OF ADAPTIVE-CONTROL
    CHEN, HF
    CAINES, PE
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1985, 30 (02) : 189 - 192
  • [22] A consistency model for evaluating distributed virtual environments
    Zhou, SP
    Cai, WT
    Turner, SJ
    Zhao, HF
    [J]. 2003 INTERNATIONAL CONFERENCE ON CYBERWORLDS, PROCEEDINGS, 2003, : 85 - 91
  • [23] A Consistency Model for Distributed Virtual Reality Systems
    Kharitonov, Vasily Y.
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON DEPENDABILITY OF COMPUTER SYSTEMS, 2009, : 271 - 278
  • [24] Consistency of a shared versioned model for distributed cooperation
    Firmenich, B
    [J]. COMPUTER-AIDED CIVIL AND INFRASTRUCTURE ENGINEERING, 2005, 20 (06) : 424 - 430
  • [25] Bolstering stochastic gradient descent with model building
    Birbil, S. Ilker
    Martin, Ozgur
    Onay, Gonenc
    Oztoprak, Figen
    [J]. TOP, 2024,
  • [26] A Sharp Estimate on the Transient Time of Distributed Stochastic Gradient Descent
    Pu, Shi
    Olshevsky, Alex
    Paschalidis, Ioannis Ch
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2022, 67 (11) : 5900 - 5915
  • [27] A Distributed Optimal Control Problem with Averaged Stochastic Gradient Descent
    Sun, Qi
    Du, Qiang
    [J]. COMMUNICATIONS IN COMPUTATIONAL PHYSICS, 2020, 27 (03) : 753 - 774
  • [28] Scaling Stratified Stochastic Gradient Descent for Distributed Matrix Completion
    Abubaker N.
    Karsavuran M.O.
    Aykanat C.
    [J]. IEEE Transactions on Knowledge and Data Engineering, 2023, 35 (10) : 10603 - 10615
  • [29] Distributed Stochastic Gradient Descent with Event-Triggered Communication
    George, Jemin
    Gurram, Prudhvi
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 7169 - 7178
  • [30] ON DISTRIBUTED STOCHASTIC GRADIENT DESCENT FOR NONCONVEX FUNCTIONS IN THE PRESENCE OF BYZANTINES
    Bulusu, Saikiran
    Khanduri, Prashant
    Sharma, Pranay
    Varshney, Pramod K.
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3137 - 3141