Faster Distributed Deep Net Training: Computation and Communication Decoupled Stochastic Gradient Descent

被引:0
|
作者
Shen, Shuheng [1 ,2 ]
Xu, Linli [1 ,2 ]
Liu, Jingchang [1 ,2 ]
Liang, Xianfeng [1 ,2 ]
Cheng, Yifei [1 ,3 ]
机构
[1] Univ Sci & Technol China, Anhui Prov Key Lab Big Data Anal & Applicat, Hefei, Anhui, Peoples R China
[2] Univ Sci & Technol China, Sch Comp Sci & Technol, Hefei, Anhui, Peoples R China
[3] Univ Sci & Technol China, Sch Data Sci, Hefei, Anhui, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the increase in the amount of data and the expansion of model scale, distributed parallel training becomes an important and successful technique to address the optimization challenges. Nevertheless, although distributed stochastic gradient descent (SGD) algorithms can achieve a linear iteration speedup, they are limited significantly in practice by the communication cost, making it difficult to achieve a linear time speedup. In this paper, we propose a computation and communication decoupled stochastic gradient descent (CoCoD-SGD) algorithm to run computation and communication in parallel to reduce the communication cost. We prove that CoCoD-SGD has a linear iteration speedup with respect to the total computation capability of the hardware resources. In addition, it has a lower communication complexity and better time speedup comparing with traditional distributed SGD algorithms. Experiments on deep neural network training demonstrate the significant improvements of CoCoD-SGD: when training ResNetl8 and VGG16 with 16 Geforce GTX 1080Ti GPUs, CoCoD-SGD is up to 2-3 x faster than traditional synchronous SGD.
引用
收藏
页码:4582 / 4589
页数:8
相关论文
共 50 条
  • [31] A Sharp Estimate on the Transient Time of Distributed Stochastic Gradient Descent
    Pu, Shi
    Olshevsky, Alex
    Paschalidis, Ioannis Ch
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2022, 67 (11) : 5900 - 5915
  • [32] A Distributed Optimal Control Problem with Averaged Stochastic Gradient Descent
    Sun, Qi
    Du, Qiang
    [J]. COMMUNICATIONS IN COMPUTATIONAL PHYSICS, 2020, 27 (03) : 753 - 774
  • [33] Scaling Stratified Stochastic Gradient Descent for Distributed Matrix Completion
    Abubaker, Nabil
    Karsavuran, M. Ozan
    Aykanat, Cevdet
    [J]. IEEE Transactions on Knowledge and Data Engineering, 2023, 35 (10) : 10603 - 10615
  • [34] ON DISTRIBUTED STOCHASTIC GRADIENT DESCENT FOR NONCONVEX FUNCTIONS IN THE PRESENCE OF BYZANTINES
    Bulusu, Saikiran
    Khanduri, Prashant
    Sharma, Pranay
    Varshney, Pramod K.
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3137 - 3141
  • [35] Convergence in High Probability of Distributed Stochastic Gradient Descent Algorithms
    Lu, Kaihong
    Wang, Hongxia
    Zhang, Huanshui
    Wang, Long
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2024, 69 (04) : 2189 - 2204
  • [36] A Novel Distributed Variant of Stochastic Gradient Descent and Its Optimization
    Wang, Yi-qi
    Zhao, Ya-wei
    Shi, Zhan
    Yin, Jian-ping
    [J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE (AICS 2016), 2016, : 486 - 492
  • [37] Distributed Differentially Private Stochastic Gradient Descent: An Empirical Study
    Hegedus, Istvan
    Jelasity, Mark
    [J]. 2016 24TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING (PDP), 2016, : 566 - 573
  • [38] Fast Convergence for Stochastic and Distributed Gradient Descent in the Interpolation Limit
    Mitra, Partha P.
    [J]. 2018 26TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2018, : 1890 - 1894
  • [39] CONTROLLING STOCHASTIC GRADIENT DESCENT USING STOCHASTIC APPROXIMATION FOR ROBUST DISTRIBUTED OPTIMIZATION
    Jain, Adit
    Krishnamurthy, Vikram
    [J]. NUMERICAL ALGEBRA CONTROL AND OPTIMIZATION, 2024,
  • [40] On Stochastic Roundoff Errors in Gradient Descent with Low-Precision Computation
    Lu Xia
    Stefano Massei
    Michiel E. Hochstenbach
    Barry Koren
    [J]. Journal of Optimization Theory and Applications, 2024, 200 : 634 - 668