An efficient bandwidth-adaptive gradient compression algorithm for distributed training of deep neural networks

被引:1
|
作者
Wang, Zeqin [1 ]
Duan, Qingyang [1 ]
Xu, Yuedong [1 ]
Zhang, Liang [2 ]
机构
[1] Fudan Univ, Sch Informat Sci & Technol, Shanghai 200433, Peoples R China
[2] Huawei Technol, Nanjing Res Ctr, Nanjing 210096, Peoples R China
基金
上海市自然科学基金; 中国国家自然科学基金;
关键词
Distributed deep learning; Gradient compression; Adaptive sparsification; Dynamic bandwidth; COMMUNICATION; OPTIMIZATION;
D O I
10.1016/j.sysarc.2024.103116
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In distributed deep learning with data parallelism, communication bottleneck throttles the efficiency of model training. Recent studies adopt versatile gradient compression techniques, with communication sparsification standing out as an effective approach for reducing the number of gradients to be transmitted. However, the deployment of gradient sparsification is adversely influenced by the change of network environment in real systems, and existing methods either neglect bandwidth dynamics during training or experience drastic fluctuation of compression ratios. In this paper, we propose ACE, a novel adaptive gradient compression mechanism with high communication efficiency under bandwidth variation. ACE adapts the sparsification ratio to the average bandwidth in a time window, other than following its dynamics exactly. To accurately compute the compression ratio, we first profile the compression time and model a single iteration time consisting of communication, computation and compression operations. We then present a practical model to fit the needed training rounds till convergence, and formulate an optimization problem to compute the optimal sparsification ratio. We conduct experiments on different DNN models in different network environments and compare various methods in terms of convergence and model quality. The experimental results show that ACE achieves up to 9.39 x and 1.28 x training speedups over fixed and state-of-the-art adaptive compression methods.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] A Communication-Efficient Distributed Gradient Clipping Algorithm for Training Deep Neural Networks
    Liu, Mingrui
    Zhuang, Zhenxun
    Lei, Yunwen
    Liao, Chunyang
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [2] An Adaptive Layer Expansion Algorithm for Efficient Training of Deep Neural Networks
    Chen, Yi-Long
    Liu, Pangfeng
    Wu, Jan-Jan
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 420 - 425
  • [3] AN ADAPTIVE CONJUGATE-GRADIENT LEARNING ALGORITHM FOR EFFICIENT TRAINING OF NEURAL NETWORKS
    ADELI, H
    HUNG, SL
    [J]. APPLIED MATHEMATICS AND COMPUTATION, 1994, 62 (01) : 81 - 102
  • [4] A fast adaptive algorithm for training deep neural networks
    Gui, Yangting
    Li, Dequan
    Fang, Runyue
    [J]. APPLIED INTELLIGENCE, 2023, 53 (04) : 4099 - 4108
  • [5] A fast adaptive algorithm for training deep neural networks
    Yangting Gui
    Dequan Li
    Runyue Fang
    [J]. Applied Intelligence, 2023, 53 : 4099 - 4108
  • [6] Bandwidth-adaptive mean shift tracking algorithm
    Kang, Wen-Jing
    Ding, Xue-Mei
    Liu, Gong-Liang
    Cui, Ji-Wen
    Ao, Lei
    [J]. Guangdianzi Jiguang/Journal of Optoelectronics Laser, 2008, 19 (01): : 135 - 138
  • [7] Bandwidth-adaptive clustering for mobile ad hoc networks
    Wang, Yong
    Kim, Min Sik
    [J]. PROCEEDINGS - 16TH INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATIONS AND NETWORKS, VOLS 1-3, 2007, : 103 - 108
  • [8] Adaptive nonmonotone conjugate gradient training algorithm for recurrent neural networks
    Peng, Chun-Cheng
    Magoulas, George D.
    [J]. 19TH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, VOL II, PROCEEDINGS, 2007, : 374 - 381
  • [9] Oblivious Routing in On-Chip Bandwidth-Adaptive Networks
    Cho, Myong Hyon
    Lis, Mieszko
    Shim, Keun Sup
    Kinsy, Michel
    Wen, Tina
    Devadas, Srinivas
    [J]. 18TH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, PROCEEDINGS, 2009, : 181 - 190
  • [10] Bandwidth-adaptive partitioning for distributed execution optimization of mobile applications
    Niu, Jianwei
    Song, Wenfang
    Atiquzzaman, Mohammed
    [J]. JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2014, 37 : 334 - 347