Efficient Training of Large-Scale Neural Networks Using Linear Pipeline Broadcast

被引:0
|
作者
University of Science and Technology, Department of Big Data Science, Daejeon [1 ]
34112, Korea, Republic of
不详 [2 ]
34141, Korea, Republic of
不详 [3 ]
34112, Korea, Republic of
机构
关键词
D O I
10.1109/ACCESS.2024.3492314
中图分类号
学科分类号
摘要
Recently, the adoption of deep learning models in several domains and for various tasks has increased, correspondingly amplifying the number of model layers and parameters needed to achieve the required performance. Accordingly, the amount of memory required for model training has increased, advancing the adoption and exploration of distributed training. Generally, model parallelism techniques require a large amount of memory for training during distributed training. Among them, layer pipelining, which involves dividing the model into layers and configuring the stages on the devices, has attracted interest. Activation recomputation is a popular method for efficiently utilizing pipeline parallelism while minimizing memory consumption. However, it can lead to a decrease in training throughput due to redundant operations. Therefore, this study introduces a forward propagation technique that employs a linear pipeline broadcast method to decrease memory consumption while mitigating training throughput reduction by partially integrating recomputation in PipeDream-Flush. The proposed broadcast-based forward propagation offsets the overhead caused by activation recomputation by optimizing network communication between pipeline stages and reducing bubbles in the warm-up phase of the pipeline. Experimental results demonstrate that the proposed technique reduces memory consumption by approximately 36.0% at peak training throughput for GPT2 than PipeDream-Flush, without a significant decrease in training throughput. Compared with that for PipeDream-Flush, the proposed method achieved peak training throughputs of 14.6% and 12.6% higher for the ResNet152 and VGG19 models, respectively, while consuming 30.1% and 12.0% lesser memory. © 2013 IEEE.
引用
收藏
页码:165653 / 165662
相关论文
共 50 条
  • [21] Training Large-Scale Spiking Neural Networks on Multi-core Neuromorphic System Using Backpropagation
    Ito, Megumi
    Rasch, Malte
    Ishii, Masatoshi
    Okazaki, Atsuya
    Kim, Sangbum
    Okazawa, Junka
    Nomura, Akiyo
    Hosokawa, Kohji
    Haensch, Wilfried
    NEURAL INFORMATION PROCESSING (ICONIP 2019), PT III, 2019, 11955 : 185 - 194
  • [22] SparseNN: A Performance-Efficient Accelerator for Large-Scale Sparse Neural Networks
    Lu, Yuntao
    Wang, Chao
    Gong, Lei
    Zhou, Xuehai
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2018, 46 (04) : 648 - 659
  • [23] SparseNN: A Performance-Efficient Accelerator for Large-Scale Sparse Neural Networks
    Yuntao Lu
    Chao Wang
    Lei Gong
    Xuehai Zhou
    International Journal of Parallel Programming, 2018, 46 : 648 - 659
  • [24] Efficient methods for large-scale linear inversion using a geostatistical approach
    Saibaba, Arvind K.
    Kitanidis, Peter K.
    WATER RESOURCES RESEARCH, 2012, 48
  • [25] On the Large-Scale Transferability of Convolutional Neural Networks
    Zheng, Liang
    Zhao, Yali
    Wang, Shengjin
    Wang, Jingdong
    Yang, Yi
    Tian, Qi
    TRENDS AND APPLICATIONS IN KNOWLEDGE DISCOVERY AND DATA MINING: PAKDD 2018 WORKSHOPS, 2018, 11154 : 27 - 39
  • [26] A Survey of Large-Scale Graph Neural Networks
    Xiao G.-Q.
    Li X.-Q.
    Chen Y.-D.
    Tang Z.
    Jiang W.-J.
    Li K.-L.
    Jisuanji Xuebao/Chinese Journal of Computers, 2024, 47 (01): : 148 - 171
  • [27] Detection of Anomalies in Large-Scale Cyberattacks Using Fuzzy Neural Networks
    Souza, Paulo Vitor de Campos
    Guimaraes, Augusto Junio
    Rezende, Thiago Silva
    Silva Araujo, Vinicius Jonathan
    Araujo, Vanessa Souza
    AI, 2020, 1 (01)
  • [28] Large-scale Continuous Gesture Recognition Using Convolutional Neural Networks
    Wang, Pichao
    Li, Wanqing
    Liu, Song
    Zhang, Yuyao
    Gao, Zhimin
    Ogunbona, Philip
    2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 13 - 18
  • [29] Large-scale parcellation of the ventricular system using convolutional neural networks
    Atlason, Hans E.
    Shao, Muhan
    Robertsson, Vidar
    Sigurdsson, Sigurdur
    Gudnason, Vilmundur
    Prince, Jerry L.
    Ellingsen, Lotta M.
    MEDICAL IMAGING 2019: BIOMEDICAL APPLICATIONS IN MOLECULAR, STRUCTURAL, AND FUNCTIONAL IMAGING, 2019, 10953
  • [30] Large-scale Isolated Gesture Recognition Using Convolutional Neural Networks
    Wang, Pichao
    Li, Wanqing
    Liu, Song
    Gao, Zhimin
    Tang, Chang
    Ogunbona, Philip
    2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 7 - 12