AccEPT: An Acceleration Scheme for Speeding Up Edge Pipeline-parallel Training

被引：0

作者：

Chen Y. ^{[1
]}

Yan Y. ^{[1
]}

Yang Q. ^{[1
]}

Shu Y. ^{[1
]}

He S. ^{[1
]}

Shi Z. ^{[1
]}

Chen J. ^{[1
]}

机构：

[1] State Key Laboratory of Industrial Control Technology, Zhejiang University, Hangzhou

来源：

IEEE Transactions on Mobile Computing | 2024年 / 23卷 / 12期

关键词：

adaptive quantization; bitwise compression; Computational modeling; Data compression scheme; Data models; distributed parallel edge training; Performance evaluation; Pipeline processing; Pipelines; Servers; Training;

D O I：

10.1109/TMC.2024.3389779

中图分类号：

学科分类号：

摘要：

It is usually infeasible to fit and train an entire large deep neural network (DNN) model using a single edge device due to the limited resources. To facilitate intelligent applications across edge devices, researchers have proposed partitioning a large model into several sub-models, and deploying each of them to a different edge device to collaboratively train a DNN model. However, the communication overhead caused by the large amount of data transmitted from one device to another during training, as well as the sub-optimal partition point due to the inaccurate latency prediction of computation at each edge device can significantly slow down training. In this paper, we propose AccEPT, an acceleration scheme for accelerating the edge collaborative pipeline-parallel training. In particular, we propose a light-weight adaptive latency predictor to accurately estimate the computation latency of each layer at different devices, which also adapts to unseen devices through continuous learning. Therefore, the proposed latency predictor leads to better model partitioning which balances the computation loads across participating devices. Moreover, we propose a bit-level computation-efficient data compression scheme to compress the data to be transmitted between devices during training. Our numerical results demonstrate that our proposed acceleration approach is able to significantly speed up edge pipeline parallel training up to 3 times faster in the considered experimental settings IEEE

引用

页码：1 / 15

页数：14

共 7 条

[1] Memory-Efficient Pipeline-Parallel DNN Training
Narayanan, Deepak
Phanishayee, Amar
Shi, Kaiyu
Chen, Xie
Zaharia, Matei
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[2] FTPipeHD: A Fault-Tolerant Pipeline-Parallel Distributed Training Approach for Heterogeneous Edge Devices
Chen, Yuhao
Yang, Qianqian
He, Shibo
Shi, Zhiguo
Chen, Jiming
Guizani, Mohsen
IEEE TRANSACTIONS ON MOBILE COMPUTING, 2024, 23 (04) : 3200 - 3212
[3] Efficient Computer Vision on Edge Devices with Pipeline-Parallel Hierarchical Neural Networks
Goel, Abhinav
Tung, Caleb
Hu, Xiao
Thiruvathukal, George K.
Davis, James C.
Lu, Yung-Hsiang
27TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE, ASP-DAC 2022, 2022, : 532 - 537
[4] SDPIPE: A Semi-Decentralized Framework for Heterogeneity-aware Pipeline-parallel Training
Miao, Xupeng
Shi, Yining
Yang, Zhi
Cui, Bin
Jia, Zhihao
PROCEEDINGS OF THE VLDB ENDOWMENT, 2023, 16 (09): : 2354 - 2363
[5] mCAP: Memory-Centric Partitioning for Large-Scale Pipeline-Parallel DNN Training
Dreuning, Henk
Bal, Henri E.
van Nieuwpoort, Rob, V
EURO-PAR 2022: PARALLEL PROCESSING, 2022, 13440 : 155 - 170
[6] Algorithmic Acceleration of Parallel ALS for Collaborative Filtering: Speeding up Distributed Big Data Recommendation in Spark
Winlaw, Manda
Hynes, Michael B.
Caterini, Anthony
De Sterck, Hans
2015 IEEE 21ST INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2015, : 682 - 691
[7] vPipe: A Virtualized Acceleration System for Achieving Efficient and Scalable Pipeline Parallel DNN Training
Zhao, Shixiong
Li, Fanxin
Chen, Xusheng
Guan, Xiuxian
Jiang, Jianyu
Huang, Dong
Qing, Yuhao
Wang, Sen
Wang, Peng
Zhang, Gong
Li, Cheng
Luo, Ping
Cui, Heming
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (03) : 489 - 506

← 1 →