Communication-Efficient Distributed Blockwise Momentum SGD with Error-Feedback

被引:0
|
作者
Zheng, Shuai [1 ,2 ]
Huang, Ziyue [1 ]
Kwok, James T. [1 ]
机构
[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Hong Kong, Peoples R China
[2] Amazon Web Serv, Seattle, WA 98109 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Communication overhead is a major bottleneck hampering the scalability of distributed machine learning systems. Recently, there has been a surge of interest in using gradient compression to improve the communication efficiency of distributed neural network training. Using 1-bit quantization, signSGD with majority vote achieves a 32x reduction on communication cost. However, its convergence is based on unrealistic assumptions and can diverge in practice. In this paper, we propose a general distributed compressed SGD with Nesterov's momentum. We consider two-way compression, which compresses the gradients both to and from workers. Convergence analysis on nonconvex problems for general gradient compressors is provided. By partitioning the gradient into blocks, a blockwise compressor is introduced such that each gradient block is compressed and transmitted in 1-bit format with a scaling factor, leading to a nearly 32x reduction on communication. Experimental results show that the proposed method converges as fast as full-precision distributed momentum SGD and achieves the same testing accuracy. In particular, on distributed ResNet training with 7 workers on the ImageNet, the proposed algorithm achieves the same testing accuracy as momentum SGD using full-precision gradients, but with 46% less wall clock time.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Communication-efficient distributed oblivious transfer
    Beimel, Amos
    Chee, Yeow Meng
    Wang, Huaxiong
    Zhang, Liang Feng
    JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2012, 78 (04) : 1142 - 1157
  • [32] Communication-Efficient Distributed Skyline Computation
    Zhang, Haoyu
    Zhang, Qin
    CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2017, : 437 - 446
  • [33] Communication-Efficient Distributed Learning: An Overview
    Cao, Xuanyu
    Basar, Tamer
    Diggavi, Suhas
    Eldar, Yonina C.
    Letaief, Khaled B.
    Poor, H. Vincent
    Zhang, Junshan
    IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2023, 41 (04) : 851 - 873
  • [34] Communication-Efficient Distributed Statistical Inference
    Jordan, Michael I.
    Lee, Jason D.
    Yang, Yun
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2019, 114 (526) : 668 - 681
  • [35] Communication-efficient distributed EM algorithm
    Liu, Xirui
    Wu, Mixia
    Xu, Liwen
    STATISTICAL PAPERS, 2024, : 5575 - 5592
  • [36] On the Linear Speedup Analysis of Communication Efficient Momentum SGD for Distributed Non-Convex Optimization
    Yu, Hao
    Jin, Rong
    Yang, Sen
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [37] Communication-efficient local SGD with age-based worker selection
    Zhu, Feng
    Zhang, Jingjing
    Wang, Xin
    JOURNAL OF SUPERCOMPUTING, 2023, 79 (12): : 13794 - 13816
  • [38] Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices
    Ryabinin, Max
    Gorbunov, Eduard
    Plokhotnyuk, Vsevolod
    Pekhimenko, Gennady
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [39] Communication-efficient local SGD with age-based worker selection
    Feng Zhu
    Jingjing Zhang
    Xin Wang
    The Journal of Supercomputing, 2023, 79 : 13794 - 13816
  • [40] Efficient-Adam: Communication-Efficient Distributed Adam
    Chen C.
    Shen L.
    Liu W.
    Luo Z.-Q.
    IEEE Transactions on Signal Processing, 2023, 71 : 3257 - 3266