Efficient Training of Large-Scale Neural Networks Using Linear Pipeline Broadcast

被引：0

作者：

University of Science and Technology, Department of Big Data Science, Daejeon ^{[1
]}

34112, Korea, Republic of

不详 ^{[2
]}

34141, Korea, Republic of

不详 ^{[3
]}

34112, Korea, Republic of

机构：

来源：

IEEE Access | 2024年 / 165653-165662期

关键词：

D O I：

10.1109/ACCESS.2024.3492314

中图分类号：

学科分类号：

摘要：

Recently, the adoption of deep learning models in several domains and for various tasks has increased, correspondingly amplifying the number of model layers and parameters needed to achieve the required performance. Accordingly, the amount of memory required for model training has increased, advancing the adoption and exploration of distributed training. Generally, model parallelism techniques require a large amount of memory for training during distributed training. Among them, layer pipelining, which involves dividing the model into layers and configuring the stages on the devices, has attracted interest. Activation recomputation is a popular method for efficiently utilizing pipeline parallelism while minimizing memory consumption. However, it can lead to a decrease in training throughput due to redundant operations. Therefore, this study introduces a forward propagation technique that employs a linear pipeline broadcast method to decrease memory consumption while mitigating training throughput reduction by partially integrating recomputation in PipeDream-Flush. The proposed broadcast-based forward propagation offsets the overhead caused by activation recomputation by optimizing network communication between pipeline stages and reducing bubbles in the warm-up phase of the pipeline. Experimental results demonstrate that the proposed technique reduces memory consumption by approximately 36.0% at peak training throughput for GPT2 than PipeDream-Flush, without a significant decrease in training throughput. Compared with that for PipeDream-Flush, the proposed method achieved peak training throughputs of 14.6% and 12.6% higher for the ResNet152 and VGG19 models, respectively, while consuming 30.1% and 12.0% lesser memory. © 2013 IEEE.

引用

页码：165653 / 165662

共 50 条

[1] Efficient Communications in Training Large Scale Neural Networks
Zhao, Yiyang
Wang, Linnan
Wu, Wei
Bosilca, George
Vuduc, Richard
Ye, Jinmian
Tang, Wenqi
Xu, Zenglin
PROCEEDINGS OF THE THEMATIC WORKSHOPS OF ACM MULTIMEDIA 2017 (THEMATIC WORKSHOPS'17), 2017, : 110 - 116
[2] MixPipe: Efficient Bidirectional Pipeline Parallelism for Training Large-Scale Models
Zhang, Weigang
Zhou, Biyu
Tang, Xuehai
Wang, Zhaoxing
Hu, Songlin
2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC, 2023,
[3] Intelligent broadcast for large-scale sensor networks
Arumugam, Rajkumar
Subramanian, Vinod
Minai, Ali A.
UNIFYING THEMES IN COMPLEX SYSTEMS IV, 2008, : 321 - 331
[4] Training of large-scale feed-forward neural networks
Seiffert, Udo
2006 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORK PROCEEDINGS, VOLS 1-10, 2006, : 5324 - 5329
[5] ETC: Efficient Training of Temporal Graph Neural Networks over Large-scale Dynamic Graphs
Gao, Shihong
Li, Yiming
Shen, Yanyan
Shao, Yingxia
Chen, Lei
PROCEEDINGS OF THE VLDB ENDOWMENT, 2024, 17 (05): : 1060 - 1072
[6] Hopfield neural networks in large-scale linear optimization problems
Velazco Fontova, Marta I.
Oliveira, Aurelio R. L.
Lyra, Christiano
APPLIED MATHEMATICS AND COMPUTATION, 2012, 218 (12) : 6851 - 6859
[7] Efficient Simulation of Large-Scale Spiking Neural Networks Using CUDA Graphics Processors
Nageswaran, Jayram Moorkanikara
Dutt, Nikil
Krichmar, Jeffrey L.
Nicolau, Alex
Veidenbaum, Alex
IJCNN: 2009 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1- 6, 2009, : 3201 - +
[8] GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism
Huang, Yanping
Cheng, Youlong
Bapna, Ankur
Firat, Orhan
Chen, Mia Xu
Chen, Dehao
Lee, HyoukJoong
Ngiam, Jiquan
Le, Quoc V.
Wu, Yonghui
Chen, Zhifeng
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[9] Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines
Li, Shigang
Hoefler, Torsten
SC21: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2021,
[10] Rethinking residual connection in training large-scale spiking neural networks
Li, Yudong
Lei, Yunlin
Yang, Xu
Neurocomputing, 2025, 616

← 1 2 3 4 5 →