Efficient Training of Large-Scale Neural Networks Using Linear Pipeline Broadcast

被引:0
|
作者
University of Science and Technology, Department of Big Data Science, Daejeon [1 ]
34112, Korea, Republic of
不详 [2 ]
34141, Korea, Republic of
不详 [3 ]
34112, Korea, Republic of
机构
关键词
D O I
10.1109/ACCESS.2024.3492314
中图分类号
学科分类号
摘要
Recently, the adoption of deep learning models in several domains and for various tasks has increased, correspondingly amplifying the number of model layers and parameters needed to achieve the required performance. Accordingly, the amount of memory required for model training has increased, advancing the adoption and exploration of distributed training. Generally, model parallelism techniques require a large amount of memory for training during distributed training. Among them, layer pipelining, which involves dividing the model into layers and configuring the stages on the devices, has attracted interest. Activation recomputation is a popular method for efficiently utilizing pipeline parallelism while minimizing memory consumption. However, it can lead to a decrease in training throughput due to redundant operations. Therefore, this study introduces a forward propagation technique that employs a linear pipeline broadcast method to decrease memory consumption while mitigating training throughput reduction by partially integrating recomputation in PipeDream-Flush. The proposed broadcast-based forward propagation offsets the overhead caused by activation recomputation by optimizing network communication between pipeline stages and reducing bubbles in the warm-up phase of the pipeline. Experimental results demonstrate that the proposed technique reduces memory consumption by approximately 36.0% at peak training throughput for GPT2 than PipeDream-Flush, without a significant decrease in training throughput. Compared with that for PipeDream-Flush, the proposed method achieved peak training throughputs of 14.6% and 12.6% higher for the ResNet152 and VGG19 models, respectively, while consuming 30.1% and 12.0% lesser memory. © 2013 IEEE.
引用
收藏
页码:165653 / 165662
相关论文
共 50 条
  • [1] Efficient Communications in Training Large Scale Neural Networks
    Zhao, Yiyang
    Wang, Linnan
    Wu, Wei
    Bosilca, George
    Vuduc, Richard
    Ye, Jinmian
    Tang, Wenqi
    Xu, Zenglin
    PROCEEDINGS OF THE THEMATIC WORKSHOPS OF ACM MULTIMEDIA 2017 (THEMATIC WORKSHOPS'17), 2017, : 110 - 116
  • [2] MixPipe: Efficient Bidirectional Pipeline Parallelism for Training Large-Scale Models
    Zhang, Weigang
    Zhou, Biyu
    Tang, Xuehai
    Wang, Zhaoxing
    Hu, Songlin
    2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC, 2023,
  • [3] Intelligent broadcast for large-scale sensor networks
    Arumugam, Rajkumar
    Subramanian, Vinod
    Minai, Ali A.
    UNIFYING THEMES IN COMPLEX SYSTEMS IV, 2008, : 321 - 331
  • [4] Training of large-scale feed-forward neural networks
    Seiffert, Udo
    2006 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORK PROCEEDINGS, VOLS 1-10, 2006, : 5324 - 5329
  • [5] ETC: Efficient Training of Temporal Graph Neural Networks over Large-scale Dynamic Graphs
    Gao, Shihong
    Li, Yiming
    Shen, Yanyan
    Shao, Yingxia
    Chen, Lei
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2024, 17 (05): : 1060 - 1072
  • [6] Hopfield neural networks in large-scale linear optimization problems
    Velazco Fontova, Marta I.
    Oliveira, Aurelio R. L.
    Lyra, Christiano
    APPLIED MATHEMATICS AND COMPUTATION, 2012, 218 (12) : 6851 - 6859
  • [7] Efficient Simulation of Large-Scale Spiking Neural Networks Using CUDA Graphics Processors
    Nageswaran, Jayram Moorkanikara
    Dutt, Nikil
    Krichmar, Jeffrey L.
    Nicolau, Alex
    Veidenbaum, Alex
    IJCNN: 2009 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1- 6, 2009, : 3201 - +
  • [8] GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism
    Huang, Yanping
    Cheng, Youlong
    Bapna, Ankur
    Firat, Orhan
    Chen, Mia Xu
    Chen, Dehao
    Lee, HyoukJoong
    Ngiam, Jiquan
    Le, Quoc V.
    Wu, Yonghui
    Chen, Zhifeng
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [9] Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines
    Li, Shigang
    Hoefler, Torsten
    SC21: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2021,
  • [10] Rethinking residual connection in training large-scale spiking neural networks
    Li, Yudong
    Lei, Yunlin
    Yang, Xu
    Neurocomputing, 2025, 616