Pipeline Parallelism With Elastic Averaging

被引:0
|
作者
Jang, Bongwon [1 ]
Yoo, In-Chul [1 ]
Yook, Dongsuk [1 ]
机构
[1] Korea University, Artificial Intelligence Laboratory, Department of Computer Science and Engineering, Seoul,02841, Korea, Republic of
关键词
To accelerate the training speed of massive DNN models on large-scale datasets; distributed training techniques; including data parallelism and model parallelism; have been extensively studied. In particular; pipeline parallelism; which is derived from model parallelism; has been attracting attention. It splits the model parameters across multiple computing nodes and executes multiple mini-batches simultaneously. However; naive pipeline parallelism suffers from the issues of weight inconsistency and delayed gradients; as the model parameters used in the forward and backward passes do not match; causing unstable training and low performance. In this study; we propose a novel pipeline parallelism technique called EA-Pipe to address the weight inconsistency and delayed gradient problems. EA-Pipe applies an elastic averaging method; which has been studied in the context of data parallelism; to pipeline parallelism. The proposed method maintains multiple model replicas to solve the weight inconsistency problem; and synchronizes the model replicas using an elasticity-based moving average method to mitigate the delayed gradient problem. To verify the efficacy of the proposed method; we conducted three image classification experiments on the CIFAR-10/100 and ImageNet datasets. The experimental results show that EA-Pipe not only accelerates training speed but also demonstrates more stable learning property compared to existing pipeline parallelism techniques. Especially; in the experiments using the CIFAR-100 and ImageNet datasets; EA-Pipe recorded error rates that were 2.58% and 2.19% lower; respectively; than the baseline pipeline parallelization method. © 2013 IEEE;
D O I
暂无
中图分类号
学科分类号
摘要
引用
收藏
页码:5477 / 5489
相关论文
共 50 条
  • [21] A Nonlinear Elastic Shape Averaging Approach
    Rumpf, Martin
    Wirth, Benedikt
    SIAM JOURNAL ON IMAGING SCIENCES, 2009, 2 (03): : 800 - 833
  • [22] Averaging anisotropic elastic constant data
    Cowin, SC
    Yang, GY
    JOURNAL OF ELASTICITY, 1997, 46 (02) : 151 - 180
  • [23] System-Enforced Deterministic Streaming for Efficient Pipeline Parallelism
    Yu Zhang
    Zhao-Peng Li
    Hui-Fang Cao
    Journal of Computer Science and Technology, 2015, 30 : 57 - 73
  • [24] AutoPipe: Automatic Configuration of Pipeline Parallelism in Shared GPU Cluster
    Hu, Jinbin
    Wang, Hao
    Liu, Ying
    Wang, Jin
    53RD INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2024, 2024, : 443 - 452
  • [25] Restoration of Legacy Parallelism: Transforming Pthreads into Farm and Pipeline Patterns
    Janjic, Vladimir
    Brown, Christopher
    Barwell, Adam D.
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2021, 49 (06) : 886 - 910
  • [26] A Memory Saving Mechanism Based on Data Transferring for Pipeline Parallelism
    Jiang, Wei
    Xu, Rui
    Ma, Sheng
    Wang, Qiong
    Hou, Xiang
    Lu, Hongyi
    19TH IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2021), 2021, : 1230 - 1235
  • [27] CAPSlog: Scalable Memory-Centric Partitioning for Pipeline Parallelism
    Dreuning, Henk
    Liokouras, Anna Badia
    Ouyang, Xiaowei
    Bal, Henri E.
    van Nieuwpoort, Rob V.
    2024 32ND EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING, PDP 2024, 2024, : 17 - 25
  • [28] System-Enforced Deterministic Streaming for Efficient Pipeline Parallelism
    Zhang, Yu
    Li, Zhao-Peng
    Cao, Hui-Fang
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2015, 30 (01) : 57 - 73
  • [29] Optimizing Resource Allocation in Pipeline Parallelism for Distributed DNN Training
    Duan, Yubin
    Wu, Jie
    2022 IEEE 28TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS, ICPADS, 2022, : 161 - 168
  • [30] Advances of Pipeline Model Parallelism for Deep Learning Training: An Overview
    Guan, Lei
    Li, Dong-Sheng
    Liang, Ji-Ye
    Wang, Wen-Jian
    Ge, Ke-Shi
    Lu, Xi-Cheng
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2024, 39 (03) : 567 - 584