Efficient Training of Large-Scale Neural Networks Using Linear Pipeline Broadcast

被引:0
|
作者
University of Science and Technology, Department of Big Data Science, Daejeon [1 ]
34112, Korea, Republic of
不详 [2 ]
34141, Korea, Republic of
不详 [3 ]
34112, Korea, Republic of
机构
关键词
D O I
10.1109/ACCESS.2024.3492314
中图分类号
学科分类号
摘要
Recently, the adoption of deep learning models in several domains and for various tasks has increased, correspondingly amplifying the number of model layers and parameters needed to achieve the required performance. Accordingly, the amount of memory required for model training has increased, advancing the adoption and exploration of distributed training. Generally, model parallelism techniques require a large amount of memory for training during distributed training. Among them, layer pipelining, which involves dividing the model into layers and configuring the stages on the devices, has attracted interest. Activation recomputation is a popular method for efficiently utilizing pipeline parallelism while minimizing memory consumption. However, it can lead to a decrease in training throughput due to redundant operations. Therefore, this study introduces a forward propagation technique that employs a linear pipeline broadcast method to decrease memory consumption while mitigating training throughput reduction by partially integrating recomputation in PipeDream-Flush. The proposed broadcast-based forward propagation offsets the overhead caused by activation recomputation by optimizing network communication between pipeline stages and reducing bubbles in the warm-up phase of the pipeline. Experimental results demonstrate that the proposed technique reduces memory consumption by approximately 36.0% at peak training throughput for GPT2 than PipeDream-Flush, without a significant decrease in training throughput. Compared with that for PipeDream-Flush, the proposed method achieved peak training throughputs of 14.6% and 12.6% higher for the ResNet152 and VGG19 models, respectively, while consuming 30.1% and 12.0% lesser memory. © 2013 IEEE.
引用
收藏
页码:165653 / 165662
相关论文
共 50 条
  • [41] On Efficient Training of Large-Scale Deep Learning Models
    Shen, Li
    Sun, Yan
    Yu, Zhiyuan
    Ding, Liang
    Tian, Xinmei
    Tao, Dacheng
    ACM Computing Surveys, 57 (03):
  • [42] Control of Average and Deviation in Large-Scale Linear Networks
    Nikitin, Denis
    Canudas-de-Wit, Carlos
    Frasca, Paolo
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2022, 67 (04) : 1639 - 1654
  • [43] Efficient on-chip training of large-scale optical neural network through block adjoint training algorithm
    Yang, Zhiwei
    Zhang, Tian
    Dai, Jian
    Xu, Kun
    Optics Express, 2024, 32 (26) : 46633 - 46648
  • [44] Graphon Control of Large-Scale Networks of Linear Systems
    Gao, Shuang
    Caines, Peter E.
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2020, 65 (10) : 4090 - 4105
  • [45] Decentralized control of a class of large-scale nonlinear systems using neural networks
    Huang, SN
    Tan, KK
    Lee, TH
    AUTOMATICA, 2005, 41 (09) : 1645 - 1649
  • [46] Large-scale Video Classification with Convolutional Neural Networks
    Karpathy, Andrej
    Toderici, George
    Shetty, Sanketh
    Leung, Thomas
    Sukthankar, Rahul
    Fei-Fei, Li
    2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 1725 - 1732
  • [47] Visualization of density relations in large-scale neural networks
    Z. Nadasdy
    L. Zaborszky
    Anatomy and Embryology, 2001, 204 : 303 - 317
  • [48] Efficient training of large neural networks for language modeling
    Schwenk, H
    2004 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-4, PROCEEDINGS, 2004, : 3059 - 3064
  • [49] Neural Correlates of Unconsciousness in Large-Scale Brain Networks
    Mashour, George A.
    Hudetz, Anthony G.
    TRENDS IN NEUROSCIENCES, 2018, 41 (03) : 150 - 160
  • [50] Large-scale neural networks and the lateralization of motivation and emotion
    Tops, Mattie
    Quirin, Markus
    Boksem, Maarten A. S.
    Koole, Sander L.
    INTERNATIONAL JOURNAL OF PSYCHOPHYSIOLOGY, 2017, 119 : 41 - 49