Pipeline Parallelism With Elastic Averaging

被引：0

作者：

Jang, Bongwon ^{[1
]}

Yoo, In-Chul ^{[1
]}

Yook, Dongsuk ^{[1
]}

机构：

[1] Korea University, Artificial Intelligence Laboratory, Department of Computer Science and Engineering, Seoul,02841, Korea, Republic of

来源：

IEEE Access | 2024年 / 12卷

关键词：

To accelerate the training speed of massive DNN models on large-scale datasets; distributed training techniques; including data parallelism and model parallelism; have been extensively studied. In particular; pipeline parallelism; which is derived from model parallelism; has been attracting attention. It splits the model parameters across multiple computing nodes and executes multiple mini-batches simultaneously. However; naive pipeline parallelism suffers from the issues of weight inconsistency and delayed gradients; as the model parameters used in the forward and backward passes do not match; causing unstable training and low performance. In this study; we propose a novel pipeline parallelism technique called EA-Pipe to address the weight inconsistency and delayed gradient problems. EA-Pipe applies an elastic averaging method; which has been studied in the context of data parallelism; to pipeline parallelism. The proposed method maintains multiple model replicas to solve the weight inconsistency problem; and synchronizes the model replicas using an elasticity-based moving average method to mitigate the delayed gradient problem. To verify the efficacy of the proposed method; we conducted three image classification experiments on the CIFAR-10/100 and ImageNet datasets. The experimental results show that EA-Pipe not only accelerates training speed but also demonstrates more stable learning property compared to existing pipeline parallelism techniques. Especially; in the experiments using the CIFAR-100 and ImageNet datasets; EA-Pipe recorded error rates that were 2.58% and 2.19% lower; respectively; than the baseline pipeline parallelization method. © 2013 IEEE;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

引用

页码：5477 / 5489

共 50 条

[21] A Nonlinear Elastic Shape Averaging Approach
Rumpf, Martin
Wirth, Benedikt
SIAM JOURNAL ON IMAGING SCIENCES, 2009, 2 (03): : 800 - 833
[22] Averaging anisotropic elastic constant data
Cowin, SC
Yang, GY
JOURNAL OF ELASTICITY, 1997, 46 (02) : 151 - 180
[23] System-Enforced Deterministic Streaming for Efficient Pipeline Parallelism
Yu Zhang
Zhao-Peng Li
Hui-Fang Cao
Journal of Computer Science and Technology, 2015, 30 : 57 - 73
[24] AutoPipe: Automatic Configuration of Pipeline Parallelism in Shared GPU Cluster
Hu, Jinbin
Wang, Hao
Liu, Ying
Wang, Jin
53RD INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2024, 2024, : 443 - 452
[25] Restoration of Legacy Parallelism: Transforming Pthreads into Farm and Pipeline Patterns
Janjic, Vladimir
Brown, Christopher
Barwell, Adam D.
INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2021, 49 (06) : 886 - 910
[26] A Memory Saving Mechanism Based on Data Transferring for Pipeline Parallelism
Jiang, Wei
Xu, Rui
Ma, Sheng
Wang, Qiong
Hou, Xiang
Lu, Hongyi
19TH IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2021), 2021, : 1230 - 1235
[27] CAPSlog: Scalable Memory-Centric Partitioning for Pipeline Parallelism
Dreuning, Henk
Liokouras, Anna Badia
Ouyang, Xiaowei
Bal, Henri E.
van Nieuwpoort, Rob V.
2024 32ND EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING, PDP 2024, 2024, : 17 - 25
[28] System-Enforced Deterministic Streaming for Efficient Pipeline Parallelism
Zhang, Yu
Li, Zhao-Peng
Cao, Hui-Fang
JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2015, 30 (01) : 57 - 73
[29] Optimizing Resource Allocation in Pipeline Parallelism for Distributed DNN Training
Duan, Yubin
Wu, Jie
2022 IEEE 28TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS, ICPADS, 2022, : 161 - 168
[30] Advances of Pipeline Model Parallelism for Deep Learning Training: An Overview
Guan, Lei
Li, Dong-Sheng
Liang, Ji-Ye
Wang, Wen-Jian
Ge, Ke-Shi
Lu, Xi-Cheng
JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2024, 39 (03) : 567 - 584

← 1 2 3 4 5 →