Communication-Efficient Distributed Blockwise Momentum SGD with Error-Feedback

被引：0

作者：

Zheng, Shuai ^{[1
,2
]}

Huang, Ziyue ^{[1
]}

Kwok, James T. ^{[1
]}

机构：

[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Hong Kong, Peoples R China

[2] Amazon Web Serv, Seattle, WA 98109 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019) | 2019年 / 32卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Communication overhead is a major bottleneck hampering the scalability of distributed machine learning systems. Recently, there has been a surge of interest in using gradient compression to improve the communication efficiency of distributed neural network training. Using 1-bit quantization, signSGD with majority vote achieves a 32x reduction on communication cost. However, its convergence is based on unrealistic assumptions and can diverge in practice. In this paper, we propose a general distributed compressed SGD with Nesterov's momentum. We consider two-way compression, which compresses the gradients both to and from workers. Convergence analysis on nonconvex problems for general gradient compressors is provided. By partitioning the gradient into blocks, a blockwise compressor is introduced such that each gradient block is compressed and transmitted in 1-bit format with a scaling factor, leading to a nearly 32x reduction on communication. Experimental results show that the proposed method converges as fast as full-precision distributed momentum SGD and achieves the same testing accuracy. In particular, on distributed ResNet training with 7 workers on the ImageNet, the proposed algorithm achieves the same testing accuracy as momentum SGD using full-precision gradients, but with 46% less wall clock time.

引用

页数：11

共 50 条

[1] Communication-Efficient Distributed SGD with Error-Feedback, Revisited
Tran Thi Phuong
Le Trieu Phong
INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2021, 14 (01) : 1373 - 1387
[2] Communication-efficient Distributed SGD with Sketching
Ivkin, Nikita
Rothchild, Daniel
Ullah, Enayat
Braverman, Vladimir
Stoica, Ion
Arora, Raman
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[3] SQuARM-SGD: Communication-Efficient Momentum SGD for Decentralized Optimization
Singh, Navjot
Data, Deepesh
George, Jemin
Diggavi, Suhas
2021 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2021, : 1212 - 1217
[4] Communication-Efficient Distributed SGD With Compressed Sensing
Tang, Yujie
Ramanathan, Vikram
Zhang, Junshan
Li, Na
IEEE CONTROL SYSTEMS LETTERS, 2022, 6 : 2054 - 2059
[5] AC-SGD: Adaptively Compressed SGD for Communication-Efficient Distributed Learning
Yan, Guangfeng
Li, Tan
Huang, Shao-Lun
Lan, Tian
Song, Linqi
IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2022, 40 (09) : 2678 - 2693
[6] DQ-SGD: Dynamic Quantization in SGD for Communication-Efficient Distributed Learning
Yan, Guangfeng
Huang, Shao-Lun
Lan, Tian
Song, Linqi
2021 IEEE 18TH INTERNATIONAL CONFERENCE ON MOBILE AD HOC AND SMART SYSTEMS (MASS 2021), 2021, : 136 - 144
[7] Communication-Efficient and Byzantine-Robust Distributed Learning with Error Feedback
Ghosh A.
Maity R.K.
Kadhe S.
Mazumdar A.
Ramchandran K.
IEEE Journal on Selected Areas in Information Theory, 2021, 2 (03): : 942 - 953
[8] A Random Access based Approach to Communication-Efficient Distributed SGD
Choi, Jinho
IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC 2022), 2022, : 4486 - 4491
[9] A Convergence Analysis of Distributed SGD with Communication-Efficient Gradient Sparsification
Shi, Shaohuai
Zhao, Kaiyong
Wang, Qiang
Tang, Zhenheng
Chu, Xiaowen
PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 3411 - 3417
[10] cpSGD: Communication-efficient and differentially-private distributed SGD
Agarwal, Naman
Suresh, Ananda Theertha
Yu, Felix
Kumar, Sanjiv
McMahan, H. Brendan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31

← 1 2 3 4 5 →