Communication-Constrained Distributed Learning: TSI-Aided Asynchronous Optimization with Stale Gradient

被引:0
|
作者
Yu, Siyuan [1 ,2 ]
Chen, Wei [1 ,2 ]
Poor, H. Vincent [3 ]
机构
[1] Tsinghua Univ, Dept Elect Engn, Beijing 100084, Peoples R China
[2] Beijing Natl Res Ctr Informat Sci & Technol BNRis, Beijing, Peoples R China
[3] Princeton Univ, Dept Elect & Comp Engn, Princeton, NJ 08544 USA
基金
美国国家科学基金会; 中国国家自然科学基金;
关键词
Asynchronous optimization; stochastic gradient descent; timing side information; gradient staleness; federated learning;
D O I
10.1109/GLOBECOM54140.2023.10437351
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Distributed machine learning including federated learning has attracted considerable attention due to its potential of scaling the computational resources, reducing the training time, and helping protect the user privacy. As one of key enablers of distributed learning, asynchronous optimization allows multiple workers to process data simultaneously without paying a cost of synchronization delay. However, given limited communication bandwidth, asynchronous optimization can be hampered by gradient staleness, which severely hinders the learning process. In this paper, we present a communication-constrained distributed learning scheme, in which asynchronous stochastic gradients generated by parallel workers are transmitted over a shared medium or link. Our aim is to minimize the average training time by striking the optimal tradeoff between the number of parallel workers and their gradient staleness. To this end, a queueing theoretic model is formulated, which allows us to find the optimal number of workers participating in the asynchronous optimization. Furthermore, we also leverage the packet arrival time at the parameter server, also referred to as Timing Side Information (TSI), to compress the staleness information for the staleness-aware Asynchronous Stochastic Gradients Descent (Asyn-SGD). Numerical results demonstrate the substantial reduction of training time owing to both the worker selection and TSI-aided compression of staleness information.
引用
收藏
页码:1495 / 1500
页数:6
相关论文
共 50 条
  • [41] Ordered Gradient Approach for Communication-Efficient Distributed Learning
    Chen, Yicheng
    Sadler, Brian M.
    Blum, Rick S.
    PROCEEDINGS OF THE 21ST IEEE INTERNATIONAL WORKSHOP ON SIGNAL PROCESSING ADVANCES IN WIRELESS COMMUNICATIONS (IEEE SPAWC2020), 2020,
  • [42] Communication-Adaptive Stochastic Gradient Methods for Distributed Learning
    Chen, Tianyi
    Sun, Yuejiao
    Yin, Wotao
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2021, 69 : 4637 - 4651
  • [43] Joint learning of constraint weights and gradient inputs in Gradient Symbolic Computation with constrained optimization
    Nelson, Max
    17TH SIGMORPHON WORKSHOP ON COMPUTATIONAL RESEARCH IN PHONETICS PHONOLOGY, AND MORPHOLOGY (SIGMORPHON 2020), 2020, : 224 - 232
  • [44] An Incremental Gradient Method for Large-scale Distributed Nonlinearly Constrained Optimization
    Kaushik, Harshal D.
    Yousefian, Farzad
    2021 AMERICAN CONTROL CONFERENCE (ACC), 2021, : 953 - 958
  • [45] A Distributed Nesterov-Like Gradient Tracking Algorithm for Composite Constrained Optimization
    Zheng, Lifeng
    Li, Huaqing
    Li, Jun
    Wang, Zheng
    Lu, Qingguo
    Shi, Yawei
    Wang, Huiwei
    Dong, Tao
    Ji, Lianghao
    Xia, Dawen
    IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, 2023, 9 : 60 - 73
  • [46] Distributed Continuous-Time Gradient-Based Algorithm for Constrained Optimization
    Yi, Peng
    Hong, Yiguang
    2014 33RD CHINESE CONTROL CONFERENCE (CCC), 2014, : 1563 - 1567
  • [47] Distributed Randomized Gradient-Free Mirror Descent Algorithm for Constrained Optimization
    Yu, Zhan
    Ho, Daniel W. C.
    Yuan, Deming
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2022, 67 (02) : 957 - 964
  • [48] Distributed gradient algorithm for constrained optimization with application to load sharing in power systems
    Yi, Peng
    Hong, Yiguang
    Liu, Feng
    SYSTEMS & CONTROL LETTERS, 2015, 83 : 45 - 52
  • [49] Distributed Gradient Methods for Convex Machine Learning Problems in Networks: Distributed Optimization
    Nedic, Angelia
    IEEE SIGNAL PROCESSING MAGAZINE, 2020, 37 (03) : 92 - 101
  • [50] Communication Scheduling Optimization for Distributed Deep Learning Systems
    Tsai, Ching-Yuan
    Lin, Ching-Chi
    Liu, Pangfeng
    Wu, Jan-Jan
    2018 IEEE 24TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS 2018), 2018, : 739 - 746