Communication-Efficient Distributed Blockwise Momentum SGD with Error-Feedback

被引:0
|
作者
Zheng, Shuai [1 ,2 ]
Huang, Ziyue [1 ]
Kwok, James T. [1 ]
机构
[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Hong Kong, Peoples R China
[2] Amazon Web Serv, Seattle, WA 98109 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Communication overhead is a major bottleneck hampering the scalability of distributed machine learning systems. Recently, there has been a surge of interest in using gradient compression to improve the communication efficiency of distributed neural network training. Using 1-bit quantization, signSGD with majority vote achieves a 32x reduction on communication cost. However, its convergence is based on unrealistic assumptions and can diverge in practice. In this paper, we propose a general distributed compressed SGD with Nesterov's momentum. We consider two-way compression, which compresses the gradients both to and from workers. Convergence analysis on nonconvex problems for general gradient compressors is provided. By partitioning the gradient into blocks, a blockwise compressor is introduced such that each gradient block is compressed and transmitted in 1-bit format with a scaling factor, leading to a nearly 32x reduction on communication. Experimental results show that the proposed method converges as fast as full-precision distributed momentum SGD and achieves the same testing accuracy. In particular, on distributed ResNet training with 7 workers on the ImageNet, the proposed algorithm achieves the same testing accuracy as momentum SGD using full-precision gradients, but with 46% less wall clock time.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Communication-Efficient Distributed SGD with Error-Feedback, Revisited
    Tran Thi Phuong
    Le Trieu Phong
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2021, 14 (01) : 1373 - 1387
  • [2] Communication-efficient Distributed SGD with Sketching
    Ivkin, Nikita
    Rothchild, Daniel
    Ullah, Enayat
    Braverman, Vladimir
    Stoica, Ion
    Arora, Raman
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [3] SQuARM-SGD: Communication-Efficient Momentum SGD for Decentralized Optimization
    Singh, Navjot
    Data, Deepesh
    George, Jemin
    Diggavi, Suhas
    2021 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2021, : 1212 - 1217
  • [4] Communication-Efficient Distributed SGD With Compressed Sensing
    Tang, Yujie
    Ramanathan, Vikram
    Zhang, Junshan
    Li, Na
    IEEE CONTROL SYSTEMS LETTERS, 2022, 6 : 2054 - 2059
  • [5] AC-SGD: Adaptively Compressed SGD for Communication-Efficient Distributed Learning
    Yan, Guangfeng
    Li, Tan
    Huang, Shao-Lun
    Lan, Tian
    Song, Linqi
    IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2022, 40 (09) : 2678 - 2693
  • [6] DQ-SGD: Dynamic Quantization in SGD for Communication-Efficient Distributed Learning
    Yan, Guangfeng
    Huang, Shao-Lun
    Lan, Tian
    Song, Linqi
    2021 IEEE 18TH INTERNATIONAL CONFERENCE ON MOBILE AD HOC AND SMART SYSTEMS (MASS 2021), 2021, : 136 - 144
  • [7] Communication-Efficient and Byzantine-Robust Distributed Learning with Error Feedback
    Ghosh A.
    Maity R.K.
    Kadhe S.
    Mazumdar A.
    Ramchandran K.
    IEEE Journal on Selected Areas in Information Theory, 2021, 2 (03): : 942 - 953
  • [8] A Random Access based Approach to Communication-Efficient Distributed SGD
    Choi, Jinho
    IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC 2022), 2022, : 4486 - 4491
  • [9] A Convergence Analysis of Distributed SGD with Communication-Efficient Gradient Sparsification
    Shi, Shaohuai
    Zhao, Kaiyong
    Wang, Qiang
    Tang, Zhenheng
    Chu, Xiaowen
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 3411 - 3417
  • [10] cpSGD: Communication-efficient and differentially-private distributed SGD
    Agarwal, Naman
    Suresh, Ananda Theertha
    Yu, Felix
    Kumar, Sanjiv
    McMahan, H. Brendan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31