Pipeline Parallelism With Elastic Averaging

被引:0
|
作者
Jang, Bongwon [1 ]
Yoo, In-Chul [1 ]
Yook, Dongsuk [1 ]
机构
[1] Korea University, Artificial Intelligence Laboratory, Department of Computer Science and Engineering, Seoul,02841, Korea, Republic of
关键词
To accelerate the training speed of massive DNN models on large-scale datasets; distributed training techniques; including data parallelism and model parallelism; have been extensively studied. In particular; pipeline parallelism; which is derived from model parallelism; has been attracting attention. It splits the model parameters across multiple computing nodes and executes multiple mini-batches simultaneously. However; naive pipeline parallelism suffers from the issues of weight inconsistency and delayed gradients; as the model parameters used in the forward and backward passes do not match; causing unstable training and low performance. In this study; we propose a novel pipeline parallelism technique called EA-Pipe to address the weight inconsistency and delayed gradient problems. EA-Pipe applies an elastic averaging method; which has been studied in the context of data parallelism; to pipeline parallelism. The proposed method maintains multiple model replicas to solve the weight inconsistency problem; and synchronizes the model replicas using an elasticity-based moving average method to mitigate the delayed gradient problem. To verify the efficacy of the proposed method; we conducted three image classification experiments on the CIFAR-10/100 and ImageNet datasets. The experimental results show that EA-Pipe not only accelerates training speed but also demonstrates more stable learning property compared to existing pipeline parallelism techniques. Especially; in the experiments using the CIFAR-100 and ImageNet datasets; EA-Pipe recorded error rates that were 2.58% and 2.19% lower; respectively; than the baseline pipeline parallelization method. © 2013 IEEE;
D O I
暂无
中图分类号
学科分类号
摘要
引用
收藏
页码:5477 / 5489
相关论文
共 50 条
  • [41] Interrupted Measurement Repositioning Using Elastic Averaging
    Rowe, Kyle G.
    Dickrell, Daniel J.
    Sawyer, W. Gregory
    TRIBOLOGY LETTERS, 2015, 59 (01)
  • [42] FedEWA: Federated Learning with Elastic Weighted Averaging
    Bai, Jun
    Sajjanhar, Atul
    Xiang, Yong
    Tong, Xiaojun
    Zeng, Shan
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [43] Interrupted Measurement Repositioning Using Elastic Averaging
    Kyle G. Rowe
    Daniel J. Dickrell
    W. Gregory Sawyer
    Tribology Letters, 2015, 59
  • [44] Principle of elastic averaging for rapid precision design
    Teo, Tat Joo
    Slocum, Alexander H.
    PRECISION ENGINEERING-JOURNAL OF THE INTERNATIONAL SOCIETIES FOR PRECISION ENGINEERING AND NANOTECHNOLOGY, 2017, 49 : 146 - 159
  • [45] ESTIMATION OF DEBYE TEMPERATURES BY AVERAGING ELASTIC COEFFICIENTS
    LEDBETTE.HM
    JOURNAL OF APPLIED PHYSICS, 1973, 44 (04) : 1451 - 1454
  • [46] Automatic Extraction of Pipeline Parallelism for Embedded Software Using Linear Programming
    Cordes, Daniel
    Heinig, Andreas
    Marwedel, Peter
    Mallik, Arindam
    2011 IEEE 17TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2011, : 699 - 706
  • [47] Mario: Near Zero-cost Activation Checkpointing in Pipeline Parallelism
    Liu, Weijian
    Li, Mingzhen
    Tan, Guangming
    Jia, Weile
    PROCEEDINGS OF THE 2025 THE 30TH ACM SIGPLAN ANNUAL SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING, PPOPP 2025, 2025, : 197 - 211
  • [48] HAP: A Heterogeneity-Conscious Runtime System for Adaptive Pipeline Parallelism
    Park, Jinsu
    Baek, Woongki
    EURO-PAR 2016: PARALLEL PROCESSING, 2016, 9833 : 518 - 530
  • [49] GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism
    Huang, Yanping
    Cheng, Youlong
    Bapna, Ankur
    Firat, Orhan
    Chen, Mia Xu
    Chen, Dehao
    Lee, HyoukJoong
    Ngiam, Jiquan
    Le, Quoc V.
    Wu, Yonghui
    Chen, Zhifeng
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [50] LineFS: Efficient SmartNIC Offload of a Distributed File System with Pipeline Parallelism
    Kim, Jongyul
    Jang, Insu
    Reda, Waleed
    Im, Jaeseong
    Canini, Marco
    Kostic, Dejan
    Kwon, Youngjin
    Peter, Simon
    Witchel, Emmett
    PROCEEDINGS OF THE 28TH ACM SYMPOSIUM ON OPERATING SYSTEMS PRINCIPLES, SOSP 2021, 2021, : 756 - 771