An Adaptive Learning Rate Schedule for SIGNSGD Optimizer in Neural Networks

被引：0

作者：

Kang Wang

Tao Sun

Yong Dou

机构：

[1] National University of Defense Technology,The National Laboratory for Parallel and Distributed Processing, School of Computer

来源：

Neural Processing Letters | 2022年 / 54卷

关键词：

SIGNSGD optimizer; An adaptive learning rate strategy; Communication; Fast convergence; Neural networks;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

SIGNSGD is able to dramatically improve the performance of training large neural networks by transmitting the sign of each minibatch stochastic gradient, which achieves gradient communication compression and keeps standard stochastic gradient descent (SGD) level convergence rate. Meanwhile, the learning rate plays a vital role in training neural networks, but existing learning rate optimization strategies mainly face the following problems: (1) for learning rate decay method, small learning rates produced lead to converge slowly, and extra hyper-parameters are required except for the initial learning rate, causing more human participation. (2) Adaptive gradient algorithms have poor generalization performance and also utilize other hyper-parameters. (3) Generating learning rates via two-level optimization models is difficult and time-consuming in training. To this end, we propose a novel adaptive learning rate schedule for neural network training via SIGNSGD optimizer for the first time. In our method, based on the theoretical inspiration that the convergence rate’s upper bound has minimization with the current learning rate in each iteration, the current learning rate can be expressed by a mathematical expression that is merely related to historical learning rates. Then, given an initial value, learning rates in different training stages can be adaptively obtained. Our proposed method has following advantages: (1) it is a novel automatic method without additional hyper-parameters except for one initial value, thus reducing the manual participation. (2) It has faster convergence rate and outperforms the standard SGD. (3) It makes neural networks achieve better performance with fewer gradient communication bits. Three numerical simulations are conducted on different neural networks with three public datasets: MNIST, Cifar-10 and Cifar-100 datasets, and several numerical results are presented to demonstrate the efficiency of our proposed approach.

引用

页码：803 / 816

页数：13

共 50 条

[21] Adaptive competitive learning neural networks
Abas, Ahmed R.
EGYPTIAN INFORMATICS JOURNAL, 2013, 14 (03) : 183 - 194
[22] Adaptive hybrid learning for neural networks
Smithies, R
Salhi, S
Queen, N
NEURAL COMPUTATION, 2004, 16 (01) : 139 - 157
[23] Learning Neural Networks with Adaptive Regularization
Zhao, Han
Tsai, Yao-Hung Hubert
Salakhutdinov, Ruslan
Gordon, Geoffrey J.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[24] The Optimization of Learning Rate for Neural Networks
Huang, Weizhe
Chen, Chi-Hua
ASIA-PACIFIC JOURNAL OF CLINICAL ONCOLOGY, 2023, 19 : 17 - 17
[25] Continuous Action Learning Automata Optimizer for training Artificial Neural Networks
Lindsay, James
Givigi, Sidney
2023 IEEE INTERNATIONAL SYSTEMS CONFERENCE, SYSCON, 2023,
[26] PID controller-based adaptive gradient optimizer for deep neural networks
Dai, Mingjun
Zhang, Zelong
Lai, Xiong
Lin, Xiaohui
Wang, Hui
IET CONTROL THEORY AND APPLICATIONS, 2023, 17 (15): : 2032 - 2037
[27] LALR: Theoretical and Experimental validation of Lipschitz Adaptive Learning Rate in Regression and Neural Networks
Saha, Snehanshu
Prashanth, Tejas
Aralihalli, Suraj
Basarkod, Sumedh
Sudarshan, T. S. B.
Dhavala, Soma S.
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
[28] Adaptive Learning Rate via Covariance Matrix Based Preconditioning for Deep Neural Networks
Ida, Yasutoshi
Fujiwara, Yasuhiro
Iwamura, Sotetsu
PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 1923 - 1929
[29] Neural Networks for Solving the Superposition Problem Using Approximation Method and Adaptive Learning Rate
Dagba, Theophile K.
Adanhounme, Villevo
Adedjouma, Semiyou A.
AGENT AND MULTI-AGENT SYSTEMS: TECHNOLOGIES AND APPLICATIONS, PT II, PROCEEDINGS, 2010, 6071 : 92 - +
[30] Learning Adaptive Gradients for Binary Neural Networks
Wang Z.-W.
Lu J.-W.
Zhou J.
Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2023, 51 (02): : 257 - 266

← 1 2 3 4 5 →