AccEPT: An Acceleration Scheme for Speeding Up Edge Pipeline-parallel Training

被引:0
|
作者
Chen Y. [1 ]
Yan Y. [1 ]
Yang Q. [1 ]
Shu Y. [1 ]
He S. [1 ]
Shi Z. [1 ]
Chen J. [1 ]
机构
[1] State Key Laboratory of Industrial Control Technology, Zhejiang University, Hangzhou
关键词
adaptive quantization; bitwise compression; Computational modeling; Data compression scheme; Data models; distributed parallel edge training; Performance evaluation; Pipeline processing; Pipelines; Servers; Training;
D O I
10.1109/TMC.2024.3389779
中图分类号
学科分类号
摘要
It is usually infeasible to fit and train an entire large deep neural network (DNN) model using a single edge device due to the limited resources. To facilitate intelligent applications across edge devices, researchers have proposed partitioning a large model into several sub-models, and deploying each of them to a different edge device to collaboratively train a DNN model. However, the communication overhead caused by the large amount of data transmitted from one device to another during training, as well as the sub-optimal partition point due to the inaccurate latency prediction of computation at each edge device can significantly slow down training. In this paper, we propose AccEPT, an acceleration scheme for accelerating the edge collaborative pipeline-parallel training. In particular, we propose a light-weight adaptive latency predictor to accurately estimate the computation latency of each layer at different devices, which also adapts to unseen devices through continuous learning. Therefore, the proposed latency predictor leads to better model partitioning which balances the computation loads across participating devices. Moreover, we propose a bit-level computation-efficient data compression scheme to compress the data to be transmitted between devices during training. Our numerical results demonstrate that our proposed acceleration approach is able to significantly speed up edge pipeline parallel training up to 3 times faster in the considered experimental settings IEEE
引用
收藏
页码:1 / 15
页数:14
相关论文
共 7 条
  • [1] Memory-Efficient Pipeline-Parallel DNN Training
    Narayanan, Deepak
    Phanishayee, Amar
    Shi, Kaiyu
    Chen, Xie
    Zaharia, Matei
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [2] FTPipeHD: A Fault-Tolerant Pipeline-Parallel Distributed Training Approach for Heterogeneous Edge Devices
    Chen, Yuhao
    Yang, Qianqian
    He, Shibo
    Shi, Zhiguo
    Chen, Jiming
    Guizani, Mohsen
    IEEE TRANSACTIONS ON MOBILE COMPUTING, 2024, 23 (04) : 3200 - 3212
  • [3] Efficient Computer Vision on Edge Devices with Pipeline-Parallel Hierarchical Neural Networks
    Goel, Abhinav
    Tung, Caleb
    Hu, Xiao
    Thiruvathukal, George K.
    Davis, James C.
    Lu, Yung-Hsiang
    27TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE, ASP-DAC 2022, 2022, : 532 - 537
  • [4] SDPIPE: A Semi-Decentralized Framework for Heterogeneity-aware Pipeline-parallel Training
    Miao, Xupeng
    Shi, Yining
    Yang, Zhi
    Cui, Bin
    Jia, Zhihao
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2023, 16 (09): : 2354 - 2363
  • [5] mCAP: Memory-Centric Partitioning for Large-Scale Pipeline-Parallel DNN Training
    Dreuning, Henk
    Bal, Henri E.
    van Nieuwpoort, Rob, V
    EURO-PAR 2022: PARALLEL PROCESSING, 2022, 13440 : 155 - 170
  • [6] Algorithmic Acceleration of Parallel ALS for Collaborative Filtering: Speeding up Distributed Big Data Recommendation in Spark
    Winlaw, Manda
    Hynes, Michael B.
    Caterini, Anthony
    De Sterck, Hans
    2015 IEEE 21ST INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2015, : 682 - 691
  • [7] vPipe: A Virtualized Acceleration System for Achieving Efficient and Scalable Pipeline Parallel DNN Training
    Zhao, Shixiong
    Li, Fanxin
    Chen, Xusheng
    Guan, Xiuxian
    Jiang, Jianyu
    Huang, Dong
    Qing, Yuhao
    Wang, Sen
    Wang, Peng
    Zhang, Gong
    Li, Cheng
    Luo, Ping
    Cui, Heming
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (03) : 489 - 506