Preserving Near-Optimal Gradient Sparsification Cost for Scalable Distributed Deep Learning

被引:0
|
作者
Yoon, Daegun [1 ]
Oh, Sangyoon [2 ]
机构
[1] ETRI, Daejeon, South Korea
[2] Ajou Univ, Suwon, South Korea
关键词
distributed deep learning; gradient sparsification; scalability;
D O I
10.1109/CCGrid59990.2024.00043
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Communication overhead is a major obstacle to scaling distributed training systems. Gradient sparsification is a potential optimization approach to reduce the communication volume without significant loss of model fidelity. However, existing gradient sparsification methods have low scalability owing to inefficient design of their algorithms, which raises the communication overhead significantly. In particular, gradient build-up and inadequate sparsity control methods degrade the sparsification performance considerably. Moreover, communication traffic increases drastically owing to workload imbalance of gradient selection between workers. To address these challenges, we propose a novel gradient sparsification scheme called ExDyna. In ExDyna, the gradient tensor of the model comprises fined-grained blocks, and contiguous blocks are grouped into non-overlapping partitions. Each worker selects gradients in its exclusively allocated partition so that gradient build-up never occurs. To balance the workload of gradient selection between workers, ExDyna adjusts the topology of partitions by comparing the workloads of adjacent partitions. In addition, ExDyna supports online threshold scaling, which estimates the accurate threshold of gradient selection on-the-fly. Accordingly, ExDyna can satisfy the user-required sparsity level during a training period regardless of models and datasets. Therefore, ExDyna can enhance the scalability of distributed training systems by preserving near-optimal gradient sparsification cost. In experiments, ExDyna outperformed state-of-the-art sparsifiers in terms of training speed and sparsification performance while achieving high accuracy.
引用
收藏
页码:320 / 329
页数:10
相关论文
共 50 条
  • [31] Deep learning for determining a near-optimal topological design without any iteration
    Yu, Yonggyun
    Hur, Taeil
    Jung, Jaeho
    Jang, In Gwun
    STRUCTURAL AND MULTIDISCIPLINARY OPTIMIZATION, 2019, 59 (03) : 787 - 799
  • [32] Deep learning for determining a near-optimal topological design without any iteration
    Yonggyun Yu
    Taeil Hur
    Jaeho Jung
    In Gwun Jang
    Structural and Multidisciplinary Optimization, 2019, 59 : 787 - 799
  • [33] Near-Optimal Vehicular Crowdsensing Task Allocation Empowered by Deep Reinforcement Learning
    Xiang C.-C.
    Li Y.-Y.
    Feng L.
    Chen C.
    Guo S.-T.
    Yang P.-L.
    Jisuanji Xuebao/Chinese Journal of Computers, 2022, 45 (05): : 918 - 934
  • [34] A Scalable and Near-Optimal Representation of Access Schemes for Memory Management
    Kritikakou, Angeliki
    Catthoor, Francky
    Kelefouras, Vasilios
    Goutis, Costas
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2014, 11 (01)
  • [35] Communication Usage Optimization of Gradient Sparsification with Aggregation in Deep Learning
    Wang, Sheng-Ping
    Liu, Pangfeng
    Wu, Jan-Jan
    PROCEEDINGS OF 2018 VII INTERNATIONAL CONFERENCE ON NETWORK, COMMUNICATION AND COMPUTING (ICNCC 2018), 2018, : 22 - 26
  • [36] NEAR-OPTIMAL FEEDBACK CONTROL - DISTRIBUTED SYSTEMS WITH DISTRIBUTION
    ZAHRADNIK, RL
    LYNN, LL
    INDUSTRIAL & ENGINEERING CHEMISTRY FUNDAMENTALS, 1971, 10 (01): : 176 - +
  • [37] Near-Optimal Distributed Degree+1 Coloring
    Halldorsson, Magnus M.
    Kuhn, Fabian
    Nolin, Alexandre
    Tonoyan, Tigran
    PROCEEDINGS OF THE 54TH ANNUAL ACM SIGACT SYMPOSIUM ON THEORY OF COMPUTING (STOC '22), 2022, : 450 - 463
  • [38] Near-Optimal Random Walk Sampling in Distributed Networks
    Das Sarma, Atish
    Molla, Anisur Rahaman
    Pandurangan, Gopal
    2012 PROCEEDINGS IEEE INFOCOM, 2012, : 2906 - 2910
  • [39] Near-Optimal Distributed Computation of Small Vertex Cuts
    Parter, Merav
    Petruschka, Asaf
    arXiv, 2022,
  • [40] Near-optimal distributed computation of small vertex cuts
    Parter, Merav
    Petruschka, Asaf
    DISTRIBUTED COMPUTING, 2024, 37 (02) : 67 - 88