Communication-Censored Distributed Stochastic Gradient Descent

被引：11

作者：

Li, Weiyu ^{[1
,2
]}

Wu, Zhaoxian ^{[1
,3
,4
]}

Chen, Tianyi ^{[5
]}

Li, Liping ^{[6
]}

Ling, Qing ^{[1
,3
,4
]}

机构：

[1] Sun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou 510006, Peoples R China

[2] Univ Sci & Technol China, Sch Gifted Young, Hefei 230026, Peoples R China

[3] Sun Yat Sen Univ, Guangdong Prov Key Lab Computat Sci, Guangzhou 510006, Peoples R China

[4] Pazhou Lab, Guangzhou 510300, Peoples R China

[5] Rensselaer Polytech Inst, Dept Elect Comp & Syst Engn, Troy, NY 12180 USA

[6] Univ Sci & Technol China, Dept Automat, Hefei 230027, Peoples R China

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2022年 / 33卷 / 11期

关键词：

Servers; Convergence; Optimization; Stochastic processes; Machine learning algorithms; Sun; Signal processing algorithms; Communication censoring; communication efficiency; distributed optimization; stochastic gradient descent (SGD); ALGORITHMS;

D O I：

10.1109/TNNLS.2021.3083655

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This article develops a communication-efficient algorithm to solve the stochastic optimization problem defined over a distributed network, aiming at reducing the burdensome communication in applications, such as distributed machine learning. Different from the existing works based on quantization and sparsification, we introduce a communication-censoring technique to reduce the transmissions of variables, which leads to our communication-censored distributed stochastic gradient descent (CSGD) algorithm. Specifically, in CSGD, the latest minibatch stochastic gradient at a worker will be transmitted to the server if and only if it is sufficiently informative. When the latest gradient is not available, the stale one will be reused at the server. To implement this communication-censoring strategy, the batch size is increasing in order to alleviate the effect of stochastic gradient noise. Theoretically, CSGD enjoys the same order of convergence rate as that of SGD but effectively reduces communication. Numerical experiments demonstrate the sizable communication saving of CSGD.

引用

页码：6831 / 6843

页数：13

共 50 条

[1] Distributed Stochastic Gradient Descent With Compressed and Skipped Communication
Phuong, Tran Thi
Phong, Le Trieu
Fukushima, Kazuhide
[J]. IEEE ACCESS, 2023, 11 : 99836 - 99846
[2] Distributed Stochastic Gradient Descent with Event-Triggered Communication
George, Jemin
Gurram, Prudhvi
[J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 7169 - 7178
[3] Bayesian Distributed Stochastic Gradient Descent
Teng, Michael
Wood, Frank
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[4] COKE: Communication-censored decentralized kernel learning
Xu, Ping
Wang, Yue
Chen, Xiang
Tian, Zhi
[J]. Journal of Machine Learning Research, 2021, 22 : 1 - 35
[5] COKE: Communication-Censored Decentralized Kernel Learning
Xu, Ping
Wang, Yue
Chen, Xiang
Tian, Zhi
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2021, 22
[6] Communication-Censored ADMM for Decentralized Consensus Optimization
Liu, Yaohua
Xu, Wei
Wu, Gang
Tian, Zhi
Ling, Qing
[J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2019, 67 (10) : 2565 - 2579
[7] An asynchronous distributed training algorithm based on Gossip communication and Stochastic Gradient Descent
Tu, Jun
Zhou, Jia
Ren, Donglin
[J]. COMPUTER COMMUNICATIONS, 2022, 195 : 416 - 423
[8] Predicting Throughput of Distributed Stochastic Gradient Descent
Li, Zhuojin
Paolieri, Marco
Golubchik, Leana
Lin, Sung-Han
Yan, Wumo
[J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (11) : 2900 - 2912
[9] Distributed stochastic gradient descent with discriminative aggregating
Chen, Zhen-Hong
Lan, Yan-Yan
Guo, Jia-Feng
Cheng, Xue-Qi
[J]. Jisuanji Xuebao/Chinese Journal of Computers, 2015, 38 (10): : 2054 - 2063
[10] Communication-Censored Linearized ADMM for Decentralized Consensus Optimization
Li, Weiyu
Liu, Yaohua
Tian, Zhi
Ling, Qing
[J]. IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, 2020, 6 (01): : 18 - 34

← 1 2 3 4 5 →