Optimization-Based Block Coordinate Gradient Coding for Mitigating Partial Stragglers in Distributed Learning

被引:2
|
作者
Wang, Qi [1 ]
Cui, Ying [2 ]
Li, Chenglin [1 ]
Zou, Junni [1 ]
Xiong, Hongkai [1 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Elect Engn, Shanghai 200240, Peoples R China
[2] Hong Kong Univ Sci & Technol Guangzhou, IoT Thrust, Guangzhou 511400, Peoples R China
基金
上海市自然科学基金; 中国国家自然科学基金;
关键词
Big Data; coded computation; distributed learning; gradient coding; stochastic optimization;
D O I
10.1109/TSP.2023.3244084
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Gradient coding schemes effectively mitigate full stragglers in distributed learning by introducing identical redundancy in coded local partial derivatives corresponding to all model parameters. However, they are no longer effective for partial stragglers as they cannot utilize incomplete computation results from partial stragglers. This paper aims to design a new gradient coding scheme for mitigating partial stragglers in distributed learning. Specifically, we consider a distributed system consisting of one master and N workers, characterized by a general partial straggler model and focuses on solving a general large-scale machine learning problem with L model parameters using gradient coding. First, we propose a coordinate gradient coding scheme with L coding parameters representing L possibly different diversities for the L coordinates, which generates most gradient coding schemes. Then, we consider the minimization of the expected overall runtime and the maximization of the completion probability with respect to the L coding parameters for coordinates, which are challenging discrete optimization problems. To reduce computational complexity, we first transform each to an equivalent but much simpler discrete problem with N << L variables representing the partition of the L coordinates into N blocks, each with identical redundancy. This indicates an equivalent but more easily implemented block coordinate gradient coding scheme with N coding parameters for blocks. Then, we adopt continuous relaxation to further reduce computational complexity. For the resulting minimization of expected overall runtime, we develop an iterative algorithm of computational complexity O(N-2) to obtain an optimal solution and derive two closed-form approximate solutions both with computational complexity O(N). For the resultant maximization of the completion probability, we develop an iterative algorithm of computational complexity O(N-2) to obtain a stationary point and derive a closed-form approximate solution with computational complexity O(N) at a large threshold. Finally, numerical results show that the proposed solutions significantly outperform existing coded computation schemes and their extensions.
引用
收藏
页码:1023 / 1038
页数:16
相关论文
共 50 条
  • [1] Optimization-based Block Coordinate Gradient Coding
    Wang, Qi
    Cui, Ying
    Li, Chenglin
    Zou, Junni
    Xiong, Hongkai
    [J]. 2021 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2021,
  • [2] Gradient Coding Based on Block Designs for Mitigating Adversarial Stragglers
    Kadhe, Swanand
    Koyluoglu, O. Ozan
    Ramchandran, Kannan
    [J]. 2019 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2019, : 2813 - 2817
  • [3] Gradient Coding: Avoiding Stragglers in Distributed Learning
    Tandon, Rashish
    Lei, Qi
    Dimakis, Alexandros G.
    Karampatziakis, Nikos
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [4] Distributed Learning Based on 1-Bit Gradient Coding in the Presence of Stragglers
    Li, Chengxi
    Skoglund, Mikael
    [J]. IEEE TRANSACTIONS ON COMMUNICATIONS, 2024, 72 (08) : 4903 - 4916
  • [5] Live Gradient Compensation for Evading Stragglers in Distributed Learning
    Xu, Jian
    Huang, Shao-Lun
    Song, Linqi
    Lan, Tian
    [J]. IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (IEEE INFOCOM 2021), 2021,
  • [6] Variants on Block Design Based Gradient Codes for Adversarial Stragglers
    Sakorikar, Animesh
    Wang, Lele
    [J]. 2021-11TH INTERNATIONAL SYMPOSIUM ON TOPICS IN CODING (ISTC'21), 2021,
  • [7] Optimization-Based Estimation of Random Distributed Parameters in Elliptic Partial Differential Equations
    Borggaard, Jeff
    van Wyk, Hans-Werner
    [J]. 2012 IEEE 51ST ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2012, : 2926 - 2933
  • [8] Two-Stage Coded Distributed Learning: A Dynamic Partial Gradient Coding Perspective
    Wang, Xinghan
    Zhong, Xiaoxiong
    Ning, Jiahong
    Yang, Tingting
    Yang, Yuanyuan
    Tang, Guoming
    Liu, Fangming
    [J]. 2023 IEEE 43RD INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS, ICDCS, 2023, : 942 - 952
  • [9] Mitigating Motion Sickness With Optimization-Based Motion Planning
    Zheng, Yanggu
    Shyrokau, Barys
    Keviczky, Tamas
    [J]. IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2024, 9 (01): : 2553 - 2563
  • [10] Mitigating Motion Sickness with Optimization-Based Motion Planning
    Zheng, Yanggu
    Shyrokau, Barys
    Keviczky, Tamas
    [J]. IEEE Transactions on Intelligent Vehicles, 2024, 9 (01): : 2553 - 2563