Layer-parallel training of residual networks with auxiliary variable networks

被引:0
|
作者
Sun, Qi [1 ,2 ]
Dong, Hexin [3 ]
Chen, Zewei [4 ]
Sun, Jiacheng [4 ]
Li, Zhenguo [4 ]
Dong, Bin [3 ,5 ]
机构
[1] Tongji Univ, Sch Math Sci, 1239 Siping Rd, Shanghai, Peoples R China
[2] Tongji Univ, Key Lab Intelligent Comp & Applicat, Minist Educ, Shanghai, Peoples R China
[3] Peking Univ, Beijing Int Ctr Math Res, Beijing, Peoples R China
[4] Huawei Noahs Ark Lab, Shenzhen, Peoples R China
[5] Peking Univ, Ctr Data Sci, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
auxiliary variable network; deep residual networks; optimal control of neural ordinary differential equations; penalty and augmented Lagrangian methods; synchronous layer-parallel training;
D O I
10.1002/num.23147
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Gradient-based methods for training residual networks (ResNets) typically require a forward pass of input data, followed by back-propagating the error gradient to update model parameters, which becomes time-consuming as the network structure goes deeper. To break the algorithmic locking and exploit synchronous module parallelism in both forward and backward modes, auxiliary-variable methods have emerged but suffer from communication overhead and a lack of data augmentation. By trading off the recomputation and storage of auxiliary variables, a joint learning framework is proposed in this work for training realistic ResNets across multiple compute devices. Specifically, the input data of each processor is generated from its low-capacity auxiliary network (AuxNet), which permits the use of data augmentation and realizes forward unlocking. Backward passes are then executed in parallel, each with a local loss function derived from the penalty or augmented Lagrangian (AL) method. Finally, the AuxNet is adjusted to reproduce updated auxiliary variables through an end-to-end training process. We demonstrate the effectiveness of our method on ResNets and WideResNets across CIFAR-10, CIFAR-100, and ImageNet datasets, achieving speedup over the traditional layer-serial training approach while maintaining comparable testing accuracy.
引用
收藏
页数:25
相关论文
共 50 条
  • [31] Wide deep residual networks in networks
    Alaeddine, Hmidi
    Jihene, Malek
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (05) : 7889 - 7899
  • [32] Optimization Method of Residual Networks of Residual Networks for Image Classification
    Zhang, Ke
    Guo, Liru
    Gao, Ce
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2018, : 321 - 325
  • [33] Optimization Method of Residual Networks of Residual Networks for Image Classification
    Lin, Long
    Yuan, Hao
    Guo, Liru
    Kuang, Yingqun
    Zhang, Ke
    [J]. INTELLIGENT COMPUTING METHODOLOGIES, ICIC 2018, PT III, 2018, 10956 : 212 - 222
  • [34] Parallel Region-Based Deep Residual Networks for Face Hallucination
    Lu, Tao
    Hao, Xiaohui
    Zhang, Yanduo
    Liu, Kai
    Xiong, Zixiang
    [J]. IEEE ACCESS, 2019, 7 (81266-81278) : 81266 - 81278
  • [35] Kinematic interpretation of layer-parallel shearing structures in Danyang area and its tectonic implication
    So, Jinhyeon
    Kim, Goeun
    Seo, Yeoeun
    Bae, Sangyeol
    Choi, Ho-Seok
    Kim, Young-Seog
    [J]. JOURNAL OF THE GEOLOGICAL SOCIETY OF KOREA, 2023, 59 (04) : 551 - 567
  • [36] Auxiliary computations for perceptron networks
    Porter, WA
    Liu, W
    Wu, CY
    [J]. CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 1996, 15 (01) : 51 - 69
  • [37] Parallel and Distributed Training of Deep Neural Networks: A brief overview
    Farkas, Attila
    Kertesz, Gabor
    Lovas, Robert
    [J]. 2020 IEEE 24TH INTERNATIONAL CONFERENCE ON INTELLIGENT ENGINEERING SYSTEMS (INES 2020), 2020, : 165 - 170
  • [38] Parallel growing and training of neural networks using output parallelism
    Guan, SU
    Li, SC
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS, 2002, 13 (03): : 542 - 550
  • [39] Parallel training of neural networks for finite element mesh decomposition
    Topping, BHV
    Khan, AI
    Bahreininejad, A
    [J]. COMPUTERS & STRUCTURES, 1997, 63 (04) : 693 - 707
  • [40] Parallel Training of Convolutional Neural Networks for Small Sample Learning
    Liu, Tianliang
    Zheng, Haihong
    Liang, Wei
    [J]. 2015 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2015,