Communication-Constrained Distributed Learning: TSI-Aided Asynchronous Optimization with Stale Gradient

被引:0
|
作者
Yu, Siyuan [1 ,2 ]
Chen, Wei [1 ,2 ]
Poor, H. Vincent [3 ]
机构
[1] Tsinghua Univ, Dept Elect Engn, Beijing 100084, Peoples R China
[2] Beijing Natl Res Ctr Informat Sci & Technol BNRis, Beijing, Peoples R China
[3] Princeton Univ, Dept Elect & Comp Engn, Princeton, NJ 08544 USA
基金
美国国家科学基金会; 中国国家自然科学基金;
关键词
Asynchronous optimization; stochastic gradient descent; timing side information; gradient staleness; federated learning;
D O I
10.1109/GLOBECOM54140.2023.10437351
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Distributed machine learning including federated learning has attracted considerable attention due to its potential of scaling the computational resources, reducing the training time, and helping protect the user privacy. As one of key enablers of distributed learning, asynchronous optimization allows multiple workers to process data simultaneously without paying a cost of synchronization delay. However, given limited communication bandwidth, asynchronous optimization can be hampered by gradient staleness, which severely hinders the learning process. In this paper, we present a communication-constrained distributed learning scheme, in which asynchronous stochastic gradients generated by parallel workers are transmitted over a shared medium or link. Our aim is to minimize the average training time by striking the optimal tradeoff between the number of parallel workers and their gradient staleness. To this end, a queueing theoretic model is formulated, which allows us to find the optimal number of workers participating in the asynchronous optimization. Furthermore, we also leverage the packet arrival time at the parameter server, also referred to as Timing Side Information (TSI), to compress the staleness information for the staleness-aware Asynchronous Stochastic Gradients Descent (Asyn-SGD). Numerical results demonstrate the substantial reduction of training time owing to both the worker selection and TSI-aided compression of staleness information.
引用
收藏
页码:1495 / 1500
页数:6
相关论文
共 50 条
  • [31] Gradient Sparsification for Communication-Efficient Distributed Optimization
    Wangni, Jianqiao
    Wang, Jialei
    Liu, Ji
    Zhang, Tong
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [32] An asynchronous distributed training algorithm based on Gossip communication and Stochastic Gradient Descent
    Tu, Jun
    Zhou, Jia
    Ren, Donglin
    COMPUTER COMMUNICATIONS, 2022, 195 : 416 - 423
  • [33] Dual-Way Gradient Sparsification for Asynchronous Distributed Deep Learning
    Yan, Zijie
    Xiao, Danyang
    Chen, Mengqiang
    Zhou, Jieying
    Wu, Weigang
    PROCEEDINGS OF THE 49TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2020, 2020,
  • [34] Distributed Nesterov Gradient and Heavy-Ball Double Accelerated Asynchronous Optimization
    Li, Huaqing
    Cheng, Huqiang
    Wang, Zheng
    Wu, Guo-Cheng
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (12) : 5723 - 5737
  • [35] A Stochastic Gradient-Based Projection Algorithm for Distributed Constrained Optimization
    Zhang, Keke
    Gao, Shanfu
    Chen, Yingjue
    Zheng, Zuqing
    Lu, Qingguo
    NEURAL INFORMATION PROCESSING, ICONIP 2023, PT I, 2024, 14447 : 356 - 367
  • [36] Communication Complexity of Distributed Convex Learning and Optimization
    Arjevani, Yossi
    Shamir, Ohad
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
  • [37] Evaluation and Optimization of Gradient Compression for Distributed Deep Learning
    Zhang, Lin
    Zhang, Longteng
    Shi, Shaohuai
    Chu, Xiaowen
    Li, Bo
    2023 IEEE 43RD INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS, ICDCS, 2023, : 361 - 371
  • [38] Gradient-Adaptive Pareto Optimization for Constrained Reinforcement Learning
    Zhou, Zixian
    Huang, Mengda
    Pan, Feiyang
    He, Jia
    Ao, Xiang
    Tu, Dandan
    He, Qing
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 9, 2023, : 11443 - 11451
  • [39] DISTRIBUTED ASYNCHRONOUS ALGORITHMS WITH STOCHASTIC DELAYS FOR CONSTRAINED OPTIMIZATION PROBLEMS WITH CONDITIONS OF TIME DRIFT
    BEIDAS, BF
    PAPAVASSILOPOULOS, GP
    PARALLEL COMPUTING, 1995, 21 (09) : 1431 - 1450
  • [40] SUCAG: Stochastic Unbiased Curvature-aided Gradient Method for Distributed Optimization
    Wai, Hoi-To
    Freris, Nikolaos M.
    Nedic, Angelia
    Scaglione, Anna
    2018 IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2018, : 1751 - 1756