Communication-Efficient Distributed Blockwise Momentum SGD with Error-Feedback

被引：0

作者：

Zheng, Shuai ^{[1
,2
]}

Huang, Ziyue ^{[1
]}

Kwok, James T. ^{[1
]}

机构：

[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Hong Kong, Peoples R China

[2] Amazon Web Serv, Seattle, WA 98109 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019) | 2019年 / 32卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Communication overhead is a major bottleneck hampering the scalability of distributed machine learning systems. Recently, there has been a surge of interest in using gradient compression to improve the communication efficiency of distributed neural network training. Using 1-bit quantization, signSGD with majority vote achieves a 32x reduction on communication cost. However, its convergence is based on unrealistic assumptions and can diverge in practice. In this paper, we propose a general distributed compressed SGD with Nesterov's momentum. We consider two-way compression, which compresses the gradients both to and from workers. Convergence analysis on nonconvex problems for general gradient compressors is provided. By partitioning the gradient into blocks, a blockwise compressor is introduced such that each gradient block is compressed and transmitted in 1-bit format with a scaling factor, leading to a nearly 32x reduction on communication. Experimental results show that the proposed method converges as fast as full-precision distributed momentum SGD and achieves the same testing accuracy. In particular, on distributed ResNet training with 7 workers on the ImageNet, the proposed algorithm achieves the same testing accuracy as momentum SGD using full-precision gradients, but with 46% less wall clock time.

引用

页数：11

共 50 条

[41] Communication-Efficient Distributed Mining of Association Rules
Assaf Schuster
Ran Wolff
Data Mining and Knowledge Discovery, 2004, 8 : 171 - 196
[42] Communication-Efficient Distributed PCA by Riemannian Optimization
Huang, Long-Kai
Pan, Sinno Jialin
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
[43] Double Quantization for Communication-Efficient Distributed Optimization
Huang, Longbo
PROCEEDINGS OF THE 13TH EAI INTERNATIONAL CONFERENCE ON PERFORMANCE EVALUATION METHODOLOGIES AND TOOLS ( VALUETOOLS 2020), 2020, : 2 - 2
[44] Communication-Efficient Distributed Dual Coordinate Ascent
Jaggi, Martin
Smith, Virginia
Takac, Martin
Terhorst, Jonathan
Krishnan, Sanjay
Hofmann, Thomas
Jordan, Michael, I
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27
[45] Double Quantization for Communication-Efficient Distributed Optimization
Yu, Yue
Wu, Jiaxiang
Huang, Longbo
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[46] Towards Communication-Efficient Distributed Background Subtraction
Hung Ngoc Phan
Synh Viet-Uyen Ha
Phuong Hoai Ha
RECENT CHALLENGES IN INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2022, 2022, 1716 : 490 - 502
[47] Communication-Efficient Distributed Optimization with Quantized Preconditioners
Alimisis, Foivos
Davies, Peter
Alistarh, Dan
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[48] Communication-Efficient Δ-Stepping for Distributed Computing Systems
Zhang, Haomeng
Xie, Junfei
Zhang, Xinyu
2023 19TH INTERNATIONAL CONFERENCE ON WIRELESS AND MOBILE COMPUTING, NETWORKING AND COMMUNICATIONS, WIMOB, 2023, : 369 - 374
[49] Communication-efficient Massively Distributed Connected Components
Lamm, Sebastian
Sanders, Peter
2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2022), 2022, : 302 - 312
[50] More communication-efficient distributed sparse learning
Zhou, Xingcai
Yang, Guang
INFORMATION SCIENCES, 2024, 668

← 1 2 3 4 5 →