Communication-Efficient Distributed Blockwise Momentum SGD with Error-Feedback

被引：0

作者：

Zheng, Shuai ^{[1
,2
]}

Huang, Ziyue ^{[1
]}

Kwok, James T. ^{[1
]}

机构：

[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Hong Kong, Peoples R China

[2] Amazon Web Serv, Seattle, WA 98109 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019) | 2019年 / 32卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Communication overhead is a major bottleneck hampering the scalability of distributed machine learning systems. Recently, there has been a surge of interest in using gradient compression to improve the communication efficiency of distributed neural network training. Using 1-bit quantization, signSGD with majority vote achieves a 32x reduction on communication cost. However, its convergence is based on unrealistic assumptions and can diverge in practice. In this paper, we propose a general distributed compressed SGD with Nesterov's momentum. We consider two-way compression, which compresses the gradients both to and from workers. Convergence analysis on nonconvex problems for general gradient compressors is provided. By partitioning the gradient into blocks, a blockwise compressor is introduced such that each gradient block is compressed and transmitted in 1-bit format with a scaling factor, leading to a nearly 32x reduction on communication. Experimental results show that the proposed method converges as fast as full-precision distributed momentum SGD and achieves the same testing accuracy. In particular, on distributed ResNet training with 7 workers on the ImageNet, the proposed algorithm achieves the same testing accuracy as momentum SGD using full-precision gradients, but with 46% less wall clock time.

引用

页数：11

共 50 条

[21] Communication-Efficient Nonconvex Federated Learning With Error Feedback for Uplink and Downlink
Zhou, Xingcai
Chang, Le
Cao, Jinde
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, : 1 - 12
[22] Communication-Efficient Decentralized Local SGD over Undirected Networks
Qin, Tiancheng
Etesami, S. Rasoul
Uribe, Cesar A.
2021 60TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2021, : 3361 - 3366
[23] Communication-Efficient Quantized SGD for Learning Polynomial Neural Network
Yang, Zhanpeng
Zhou, Yong
Wu, Youlong
Shi, Yuanming
2021 IEEE INTERNATIONAL PERFORMANCE, COMPUTING, AND COMMUNICATIONS CONFERENCE (IPCCC), 2021,
[24] QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding
Alistarh, Dan
Grubic, Demjan
Li, Jerry Z.
Tomioka, Ryota
Vojnovic, Milan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
[25] Communication-efficient Federated Learning via Quantized Clipped SGD
Jia, Ninghui
Qu, Zhihao
Ye, Baoliu
WIRELESS ALGORITHMS, SYSTEMS, AND APPLICATIONS, WASA 2021, PT I, 2021, 12937 : 559 - 571
[26] COMMUNICATION-EFFICIENT DISTRIBUTED MAX-VAR GENERALIZED CCA VIA ERROR FEEDBACK-ASSISTED QUANTIZATION
Shrestha, Sagar
Fu, Xiao
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 9052 - 9056
[27] Adaptive Top-K in SGD for Communication-Efficient Distributed Learning in Multi-Robot Collaboration
Ruan, Mengzhe
Yan, Guangfeng
Xiao, Yuanzhang
Song, Linqi
Xu, Weitao
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2024, 18 (03) : 487 - 501
[28] Detached Error Feedback for Distributed SGD with Random Sparsification
Xu, An
Huang, Heng
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[29] Communication-Efficient Distributed Eigenspace Estimation
Charisopoulos, Vasileios
Benson, Austin R.
Damle, Anil
SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE, 2021, 3 (04): : 1067 - 1092
[30] FAST AND COMMUNICATION-EFFICIENT DISTRIBUTED PCA
Gang, Arpita
Raja, Haroon
Bajwa, Waheed U.
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7450 - 7454

← 1 2 3 4 5 →