An efficient bandwidth-adaptive gradient compression algorithm for distributed training of deep neural networks

被引:1
|
作者
Wang, Zeqin [1 ]
Duan, Qingyang [1 ]
Xu, Yuedong [1 ]
Zhang, Liang [2 ]
机构
[1] Fudan Univ, Sch Informat Sci & Technol, Shanghai 200433, Peoples R China
[2] Huawei Technol, Nanjing Res Ctr, Nanjing 210096, Peoples R China
基金
上海市自然科学基金; 中国国家自然科学基金;
关键词
Distributed deep learning; Gradient compression; Adaptive sparsification; Dynamic bandwidth; COMMUNICATION; OPTIMIZATION;
D O I
10.1016/j.sysarc.2024.103116
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In distributed deep learning with data parallelism, communication bottleneck throttles the efficiency of model training. Recent studies adopt versatile gradient compression techniques, with communication sparsification standing out as an effective approach for reducing the number of gradients to be transmitted. However, the deployment of gradient sparsification is adversely influenced by the change of network environment in real systems, and existing methods either neglect bandwidth dynamics during training or experience drastic fluctuation of compression ratios. In this paper, we propose ACE, a novel adaptive gradient compression mechanism with high communication efficiency under bandwidth variation. ACE adapts the sparsification ratio to the average bandwidth in a time window, other than following its dynamics exactly. To accurately compute the compression ratio, we first profile the compression time and model a single iteration time consisting of communication, computation and compression operations. We then present a practical model to fit the needed training rounds till convergence, and formulate an optimization problem to compute the optimal sparsification ratio. We conduct experiments on different DNN models in different network environments and compare various methods in terms of convergence and model quality. The experimental results show that ACE achieves up to 9.39 x and 1.28 x training speedups over fixed and state-of-the-art adaptive compression methods.
引用
收藏
页数:14
相关论文
共 50 条
  • [31] DeepCABAC: A Universal Compression Algorithm for Deep Neural Networks
    Wiedemann, Simon
    Kirchhoffer, Heiner
    Matlage, Stefan
    Haase, Paul
    Marban, Arturo
    Marinc, Talmaj
    Neumann, David
    Nguyen, Tung
    Schwarz, Heiko
    Wiegand, Thomas
    Marpe, Detlev
    Samek, Wojciech
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2020, 14 (04) : 700 - 714
  • [32] Efficient and Structural Gradient Compression with Principal Component Analysis for Distributed Training
    Tan, Jiaxin
    Yao, Chao
    Guo, Zehua
    [J]. PROCEEDINGS OF THE 7TH ASIA-PACIFIC WORKSHOP ON NETWORKING, APNET 2023, 2023, : 217 - 218
  • [33] An Efficient Method for Training Deep Learning Networks Distributed
    Wang, Chenxu
    Lu, Yutong
    Chen, Zhiguang
    Li, Junnan
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2020, E103D (12) : 2444 - 2456
  • [34] Research and design of distributed training algorithm for neural networks
    Yang, B
    Wang, YD
    Su, XH
    [J]. Proceedings of 2005 International Conference on Machine Learning and Cybernetics, Vols 1-9, 2005, : 4044 - 4049
  • [35] SPEAKER ADAPTIVE TRAINING USING DEEP NEURAL NETWORKS
    Ochiai, Tsubasa
    Matsuda, Shigeki
    Lu, Xugang
    Hori, Chiori
    Katagiri, Shigeru
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [36] IMPROVEMENTS TO SPEAKER ADAPTIVE TRAINING OF DEEP NEURAL NETWORKS
    Miao, Yajie
    Jiang, Lu
    Zhang, Hao
    Metze, Florian
    [J]. 2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014, 2014, : 165 - 170
  • [37] The adaptive fuzzy training algorithm for feedforward neural networks
    Xie, P.
    Liu, B.
    [J]. Xi Tong Gong Cheng Yu Dian Zi Ji Shu/Systems Engineering and Electronics, 2001, 23 (07): : 79 - 82
  • [38] An Adaptive Gradient Method with Differentiation Element in Deep Neural Networks
    Wang, Runqi
    Wang, Wei
    Ma, Teli
    Zhang, Baochang
    [J]. PROCEEDINGS OF THE 15TH IEEE CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA 2020), 2020, : 1582 - 1587
  • [39] Gradient Descent Analysis: On Visualizing the Training of Deep Neural Networks
    Becker, Martin
    Lippel, Jens
    Zielke, Thomas
    [J]. PROCEEDINGS OF THE 14TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS - VOL 3: IVAPP, 2019, : 338 - 345
  • [40] AdaComp: Adaptive Residual Gradient Compression for Data-Parallel Distributed Training
    Chen, Chia-Yu
    Choi, Jungwook
    Brand, Daniel
    Agrawal, Ankur
    Zhang, Wei
    Gopalakrishnan, Kailash
    [J]. THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 2827 - 2835