Preserving Near-Optimal Gradient Sparsification Cost for Scalable Distributed Deep Learning

被引:0
|
作者
Yoon, Daegun [1 ]
Oh, Sangyoon [2 ]
机构
[1] ETRI, Daejeon, South Korea
[2] Ajou Univ, Suwon, South Korea
关键词
distributed deep learning; gradient sparsification; scalability;
D O I
10.1109/CCGrid59990.2024.00043
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Communication overhead is a major obstacle to scaling distributed training systems. Gradient sparsification is a potential optimization approach to reduce the communication volume without significant loss of model fidelity. However, existing gradient sparsification methods have low scalability owing to inefficient design of their algorithms, which raises the communication overhead significantly. In particular, gradient build-up and inadequate sparsity control methods degrade the sparsification performance considerably. Moreover, communication traffic increases drastically owing to workload imbalance of gradient selection between workers. To address these challenges, we propose a novel gradient sparsification scheme called ExDyna. In ExDyna, the gradient tensor of the model comprises fined-grained blocks, and contiguous blocks are grouped into non-overlapping partitions. Each worker selects gradients in its exclusively allocated partition so that gradient build-up never occurs. To balance the workload of gradient selection between workers, ExDyna adjusts the topology of partitions by comparing the workloads of adjacent partitions. In addition, ExDyna supports online threshold scaling, which estimates the accurate threshold of gradient selection on-the-fly. Accordingly, ExDyna can satisfy the user-required sparsity level during a training period regardless of models and datasets. Therefore, ExDyna can enhance the scalability of distributed training systems by preserving near-optimal gradient sparsification cost. In experiments, ExDyna outperformed state-of-the-art sparsifiers in terms of training speed and sparsification performance while achieving high accuracy.
引用
收藏
页码:320 / 329
页数:10
相关论文
共 50 条
  • [1] Preserving Near-Optimal Gradient Sparsification Cost for Scalable Distributed Deep Learning
    Yoon, Daegun
    Oh, Sangyoon
    arXiv,
  • [2] AutoDDL: Automatic Distributed Deep Learning With Near-Optimal Bandwidth Cost
    Chen, Jinfan
    Li, Shigang
    Guo, Ran
    Yuan, Jinhui
    Hoefler, Torsten
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2024, 35 (08) : 1331 - 1344
  • [3] Near-Optimal Sparse Allreduce for Distributed Deep Learning
    Li, Shigang
    Hoefler, Torsten
    PPOPP'22: PROCEEDINGS OF THE 27TH ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING, 2022, : 135 - 149
  • [4] Near-Optimal Straggler Mitigation for Distributed Gradient Methods
    Li, Songze
    Kalan, Seyed Mohammadreza Mousavi
    Avestimehr, A. Salman
    Soltanolkotabi, Mahdi
    2018 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2018), 2018, : 857 - 866
  • [5] Dual-Way Gradient Sparsification for Asynchronous Distributed Deep Learning
    Yan, Zijie
    Xiao, Danyang
    Chen, Mengqiang
    Zhou, Jieying
    Wu, Weigang
    PROCEEDINGS OF THE 49TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2020, 2020,
  • [6] OF-WFBP: A near-optimal communication mechanism for tensor fusion in distributed deep learning
    Gao, Yunqi
    Zhang, Zechao
    Hu, Bing
    Jin, A-Long
    Wu, Chunming
    PARALLEL COMPUTING, 2023, 118
  • [7] Distributed near-optimal matching
    Deng, XT
    COMBINATORICA, 1996, 16 (04) : 453 - 464
  • [8] Communication-Efficient Distributed Deep Learning with Merged Gradient Sparsification on GPUs
    Shi, Shaohuai
    Wang, Qiang
    Chu, Xiaowen
    Li, Bo
    Qin, Yang
    Liu, Ruihao
    Zhao, Xinxiao
    IEEE INFOCOM 2020 - IEEE CONFERENCE ON COMPUTER COMMUNICATIONS, 2020, : 406 - 415
  • [9] Scalable distributed algorithms for multi-robot near-optimal motion planning
    Zhao, Guoxiang
    Zhu, Minghui
    AUTOMATICA, 2022, 140
  • [10] Scalable distributed algorithms for multi-robot near-optimal motion planning
    Zhao, Guoxiang
    Zhu, Minghui
    2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 226 - 231