Layer-parallel training of residual networks with auxiliary variable networks

被引：0

作者：

Sun, Qi ^{[1
,2
]}

Dong, Hexin ^{[3
]}

Chen, Zewei ^{[4
]}

Sun, Jiacheng ^{[4
]}

Li, Zhenguo ^{[4
]}

Dong, Bin ^{[3
,5
]}

机构：

[1] Tongji Univ, Sch Math Sci, 1239 Siping Rd, Shanghai, Peoples R China

[2] Tongji Univ, Key Lab Intelligent Comp & Applicat, Minist Educ, Shanghai, Peoples R China

[3] Peking Univ, Beijing Int Ctr Math Res, Beijing, Peoples R China

[4] Huawei Noahs Ark Lab, Shenzhen, Peoples R China

[5] Peking Univ, Ctr Data Sci, Beijing, Peoples R China

来源：

NUMERICAL METHODS FOR PARTIAL DIFFERENTIAL EQUATIONS | 2024年 / 40卷 / 06期

基金：

中国国家自然科学基金;

关键词：

auxiliary variable network; deep residual networks; optimal control of neural ordinary differential equations; penalty and augmented Lagrangian methods; synchronous layer-parallel training;

D O I：

10.1002/num.23147

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

Gradient-based methods for training residual networks (ResNets) typically require a forward pass of input data, followed by back-propagating the error gradient to update model parameters, which becomes time-consuming as the network structure goes deeper. To break the algorithmic locking and exploit synchronous module parallelism in both forward and backward modes, auxiliary-variable methods have emerged but suffer from communication overhead and a lack of data augmentation. By trading off the recomputation and storage of auxiliary variables, a joint learning framework is proposed in this work for training realistic ResNets across multiple compute devices. Specifically, the input data of each processor is generated from its low-capacity auxiliary network (AuxNet), which permits the use of data augmentation and realizes forward unlocking. Backward passes are then executed in parallel, each with a local loss function derived from the penalty or augmented Lagrangian (AL) method. Finally, the AuxNet is adjusted to reproduce updated auxiliary variables through an end-to-end training process. We demonstrate the effectiveness of our method on ResNets and WideResNets across CIFAR-10, CIFAR-100, and ImageNet datasets, achieving speedup over the traditional layer-serial training approach while maintaining comparable testing accuracy.

引用

页数：25

共 50 条

[31] Wide deep residual networks in networks
Alaeddine, Hmidi
Jihene, Malek
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (05) : 7889 - 7899
[32] Optimization Method of Residual Networks of Residual Networks for Image Classification
Zhang, Ke
Guo, Liru
Gao, Ce
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2018, : 321 - 325
[33] Optimization Method of Residual Networks of Residual Networks for Image Classification
Lin, Long
Yuan, Hao
Guo, Liru
Kuang, Yingqun
Zhang, Ke
[J]. INTELLIGENT COMPUTING METHODOLOGIES, ICIC 2018, PT III, 2018, 10956 : 212 - 222
[34] Parallel Region-Based Deep Residual Networks for Face Hallucination
Lu, Tao
Hao, Xiaohui
Zhang, Yanduo
Liu, Kai
Xiong, Zixiang
[J]. IEEE ACCESS, 2019, 7 (81266-81278) : 81266 - 81278
[35] Kinematic interpretation of layer-parallel shearing structures in Danyang area and its tectonic implication
So, Jinhyeon
Kim, Goeun
Seo, Yeoeun
Bae, Sangyeol
Choi, Ho-Seok
Kim, Young-Seog
[J]. JOURNAL OF THE GEOLOGICAL SOCIETY OF KOREA, 2023, 59 (04) : 551 - 567
[36] Auxiliary computations for perceptron networks
Porter, WA
Liu, W
Wu, CY
[J]. CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 1996, 15 (01) : 51 - 69
[37] Parallel and Distributed Training of Deep Neural Networks: A brief overview
Farkas, Attila
Kertesz, Gabor
Lovas, Robert
[J]. 2020 IEEE 24TH INTERNATIONAL CONFERENCE ON INTELLIGENT ENGINEERING SYSTEMS (INES 2020), 2020, : 165 - 170
[38] Parallel growing and training of neural networks using output parallelism
Guan, SU
Li, SC
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS, 2002, 13 (03): : 542 - 550
[39] Parallel training of neural networks for finite element mesh decomposition
Topping, BHV
Khan, AI
Bahreininejad, A
[J]. COMPUTERS & STRUCTURES, 1997, 63 (04) : 693 - 707
[40] Parallel Training of Convolutional Neural Networks for Small Sample Learning
Liu, Tianliang
Zheng, Haihong
Liang, Wei
[J]. 2015 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2015,

← 1 2 3 4 5 →