An Adaptive Learning Rate Schedule for SIGNSGD Optimizer in Neural Networks

被引：0

作者：

Kang Wang

Tao Sun

Yong Dou

机构：

[1] National University of Defense Technology,The National Laboratory for Parallel and Distributed Processing, School of Computer

来源：

Neural Processing Letters | 2022年 / 54卷

关键词：

SIGNSGD optimizer; An adaptive learning rate strategy; Communication; Fast convergence; Neural networks;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

SIGNSGD is able to dramatically improve the performance of training large neural networks by transmitting the sign of each minibatch stochastic gradient, which achieves gradient communication compression and keeps standard stochastic gradient descent (SGD) level convergence rate. Meanwhile, the learning rate plays a vital role in training neural networks, but existing learning rate optimization strategies mainly face the following problems: (1) for learning rate decay method, small learning rates produced lead to converge slowly, and extra hyper-parameters are required except for the initial learning rate, causing more human participation. (2) Adaptive gradient algorithms have poor generalization performance and also utilize other hyper-parameters. (3) Generating learning rates via two-level optimization models is difficult and time-consuming in training. To this end, we propose a novel adaptive learning rate schedule for neural network training via SIGNSGD optimizer for the first time. In our method, based on the theoretical inspiration that the convergence rate’s upper bound has minimization with the current learning rate in each iteration, the current learning rate can be expressed by a mathematical expression that is merely related to historical learning rates. Then, given an initial value, learning rates in different training stages can be adaptively obtained. Our proposed method has following advantages: (1) it is a novel automatic method without additional hyper-parameters except for one initial value, thus reducing the manual participation. (2) It has faster convergence rate and outperforms the standard SGD. (3) It makes neural networks achieve better performance with fewer gradient communication bits. Three numerical simulations are conducted on different neural networks with three public datasets: MNIST, Cifar-10 and Cifar-100 datasets, and several numerical results are presented to demonstrate the efficiency of our proposed approach.

引用

页码：803 / 816

页数：13

共 50 条

[11] ADAPID: AN ADAPTIVE PID OPTIMIZER FOR TRAINING DEEP NEURAL NETWORKS
Weng, Boxi
Sun, Jian
Sadeghi, Alireza
Wang, Gang
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 3943 - 3947
[12] KAISA: An Adaptive Second -Order Optimizer Framework for Deep Neural Networks
Pauloski, J. Gregory
Huang, Qi
Huang, Lei
Venkataraman, Shivaram
Chard, Kyle
Foster, Ian
Zhang, Zhao
SC21: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2021,
[13] A scalable second order optimizer with an adaptive trust region for neural networks
Yang, Donghee
Cho, Junhyun
Lee, Sungchul
NEURAL NETWORKS, 2023, 167 : 692 - 705
[14] A differential adaptive learning rate method for back-propagation neural networks
Department of Computer Engineering, Azad University of Qazvin, Iran
不详
World Acad. Sci. Eng. Technol., 2009, (289-292):
[15] Neural Networks: Different problems require different learning rate adaptive methods
Allard, R
Faubert, J
IMAGE PROCESSING: ALGORITHMS AND SYSTEMS III, 2004, 5298 : 516 - 527
[16] CONSTRUCTIVE APPROACHES FOR TRAINING OF WAVELET NEURAL NETWORKS USING ADAPTIVE LEARNING RATE
Skhiri, Mohamed Zine El Abidine
Chtourou, Mohamed
INTERNATIONAL JOURNAL OF WAVELETS MULTIRESOLUTION AND INFORMATION PROCESSING, 2013, 11 (03)
[17] Approximating Algorithm of Wavelet Neural Networks with Self-adaptive Learning Rate
Gan Xusheng
Duanmu Jingshu
Wang Qing
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY, 2008, : 968 - 972
[18] A Diffferential Adaptive Learning Rate Method for Back-Propagation Neural Networks
Iranmanesh, Saeid
NN'09: PROCEEDINGS OF THE 10TH WSEAS INTERNATIONAL CONFERENCE ON NEURAL NETWORKS, 2009, : 30 - 34
[19] An analytical approach to hardware-friendly adaptive learning rate neural networks
Rezaie, MG
Farbiz, F
Fakhraie, SM
16TH INTERNATIONAL CONFERENCE ON MICROELECTRONICS, PROCEEDINGS, 2004, : 320 - 323
[20] Appropriate Learning Rates of Adaptive Learning Rate Optimization Algorithms for Training Deep Neural Networks
Iiduka, Hideaki
IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (12) : 13250 - 13261

← 1 2 3 4 5 →