Communication-Constrained Distributed Learning: TSI-Aided Asynchronous Optimization with Stale Gradient

被引：0

作者：

Yu, Siyuan ^{[1
,2
]}

Chen, Wei ^{[1
,2
]}

Poor, H. Vincent ^{[3
]}

机构：

[1] Tsinghua Univ, Dept Elect Engn, Beijing 100084, Peoples R China

[2] Beijing Natl Res Ctr Informat Sci & Technol BNRis, Beijing, Peoples R China

[3] Princeton Univ, Dept Elect & Comp Engn, Princeton, NJ 08544 USA

来源：

IEEE CONFERENCE ON GLOBAL COMMUNICATIONS, GLOBECOM | 2023年

基金：

美国国家科学基金会; 中国国家自然科学基金;

关键词：

Asynchronous optimization; stochastic gradient descent; timing side information; gradient staleness; federated learning;

D O I：

10.1109/GLOBECOM54140.2023.10437351

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Distributed machine learning including federated learning has attracted considerable attention due to its potential of scaling the computational resources, reducing the training time, and helping protect the user privacy. As one of key enablers of distributed learning, asynchronous optimization allows multiple workers to process data simultaneously without paying a cost of synchronization delay. However, given limited communication bandwidth, asynchronous optimization can be hampered by gradient staleness, which severely hinders the learning process. In this paper, we present a communication-constrained distributed learning scheme, in which asynchronous stochastic gradients generated by parallel workers are transmitted over a shared medium or link. Our aim is to minimize the average training time by striking the optimal tradeoff between the number of parallel workers and their gradient staleness. To this end, a queueing theoretic model is formulated, which allows us to find the optimal number of workers participating in the asynchronous optimization. Furthermore, we also leverage the packet arrival time at the parameter server, also referred to as Timing Side Information (TSI), to compress the staleness information for the staleness-aware Asynchronous Stochastic Gradients Descent (Asyn-SGD). Numerical results demonstrate the substantial reduction of training time owing to both the worker selection and TSI-aided compression of staleness information.

引用

页码：1495 / 1500

页数：6

共 50 条

[21] Asynchronous Distributed Optimization with Minimal Communication and Connectivity Preservation
Zhong, Minyi
Cassandras, Christos G.
PROCEEDINGS OF THE 48TH IEEE CONFERENCE ON DECISION AND CONTROL, 2009 HELD JOINTLY WITH THE 2009 28TH CHINESE CONTROL CONFERENCE (CDC/CCC 2009), 2009, : 5396 - 5401
[22] Asynchronous Distributed Optimization via ADMM with Efficient Communication
Rikos, Apostolos, I
Jiang, Wei
Charalambous, Themistoklis
Johansson, Karl H.
2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 7002 - 7008
[23] Asynchronous Distributed Optimization With Event-Driven Communication
Zhong, Minyi
Cassandras, Christos G.
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2010, 55 (12) : 2735 - 2750
[24] An Approximate Gradient Algorithm for Constrained Distributed Convex Optimization
Yanqiong Zhang
Youcheng Lou
Yiguang Hong
IEEE/CAA Journal of Automatica Sinica, 2014, 1 (01) : 61 - 67
[25] An Asynchronous Distributed Proximal Gradient Method for Composite Convex Optimization
Aybat, N. S.
Wang, Z.
Iyengar, G.
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 2454 - 2462
[26] Consistency in models for communication constrained distributed learning
Predd, JB
Kulkarni, SR
Poor, HV
LEARNING THEORY, PROCEEDINGS, 2004, 3120 : 442 - 456
[27] Asynchronous Distributed Optimization Via Randomized Dual Proximal Gradient
Notarnicola, Ivano
Notarstefano, Giuseppe
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2017, 62 (05) : 2095 - 2106
[28] Distributed Optimization with Gradient Descent and Quantized Communication
Rikos, Apostolos I.
Jiang, Wei
Charalambous, Themistoklis
Johansson, Karl H.
IFAC PAPERSONLINE, 2023, 56 (02): : 5900 - 5906
[29] Gradient Staleness in Asynchronous Optimization Under Random Communication Delays
Al-Lawati, Haider
Draper, Stark C.
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4353 - 4357
[30] Zeroth-order Gradient Tracking for Distributed Constrained Optimization
Cheng, Songsong
Yu, Xin
Fan, Yuan
Xiao, Gaoxi
IFAC PAPERSONLINE, 2023, 56 (02): : 5197 - 5202

← 1 2 3 4 5 →