An Adaptive Learning Rate Schedule for SIGNSGD Optimizer in Neural Networks

被引:0
|
作者
Kang Wang
Tao Sun
Yong Dou
机构
[1] National University of Defense Technology,The National Laboratory for Parallel and Distributed Processing, School of Computer
来源
Neural Processing Letters | 2022年 / 54卷
关键词
SIGNSGD optimizer; An adaptive learning rate strategy; Communication; Fast convergence; Neural networks;
D O I
暂无
中图分类号
学科分类号
摘要
SIGNSGD is able to dramatically improve the performance of training large neural networks by transmitting the sign of each minibatch stochastic gradient, which achieves gradient communication compression and keeps standard stochastic gradient descent (SGD) level convergence rate. Meanwhile, the learning rate plays a vital role in training neural networks, but existing learning rate optimization strategies mainly face the following problems: (1) for learning rate decay method, small learning rates produced lead to converge slowly, and extra hyper-parameters are required except for the initial learning rate, causing more human participation. (2) Adaptive gradient algorithms have poor generalization performance and also utilize other hyper-parameters. (3) Generating learning rates via two-level optimization models is difficult and time-consuming in training. To this end, we propose a novel adaptive learning rate schedule for neural network training via SIGNSGD optimizer for the first time. In our method, based on the theoretical inspiration that the convergence rate’s upper bound has minimization with the current learning rate in each iteration, the current learning rate can be expressed by a mathematical expression that is merely related to historical learning rates. Then, given an initial value, learning rates in different training stages can be adaptively obtained. Our proposed method has following advantages: (1) it is a novel automatic method without additional hyper-parameters except for one initial value, thus reducing the manual participation. (2) It has faster convergence rate and outperforms the standard SGD. (3) It makes neural networks achieve better performance with fewer gradient communication bits. Three numerical simulations are conducted on different neural networks with three public datasets: MNIST, Cifar-10 and Cifar-100 datasets, and several numerical results are presented to demonstrate the efficiency of our proposed approach.
引用
收藏
页码:803 / 816
页数:13
相关论文
共 50 条
  • [11] ADAPID: AN ADAPTIVE PID OPTIMIZER FOR TRAINING DEEP NEURAL NETWORKS
    Weng, Boxi
    Sun, Jian
    Sadeghi, Alireza
    Wang, Gang
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 3943 - 3947
  • [12] KAISA: An Adaptive Second -Order Optimizer Framework for Deep Neural Networks
    Pauloski, J. Gregory
    Huang, Qi
    Huang, Lei
    Venkataraman, Shivaram
    Chard, Kyle
    Foster, Ian
    Zhang, Zhao
    SC21: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2021,
  • [13] A scalable second order optimizer with an adaptive trust region for neural networks
    Yang, Donghee
    Cho, Junhyun
    Lee, Sungchul
    NEURAL NETWORKS, 2023, 167 : 692 - 705
  • [14] A differential adaptive learning rate method for back-propagation neural networks
    Department of Computer Engineering, Azad University of Qazvin, Iran
    不详
    World Acad. Sci. Eng. Technol., 2009, (289-292):
  • [15] Neural Networks: Different problems require different learning rate adaptive methods
    Allard, R
    Faubert, J
    IMAGE PROCESSING: ALGORITHMS AND SYSTEMS III, 2004, 5298 : 516 - 527
  • [16] CONSTRUCTIVE APPROACHES FOR TRAINING OF WAVELET NEURAL NETWORKS USING ADAPTIVE LEARNING RATE
    Skhiri, Mohamed Zine El Abidine
    Chtourou, Mohamed
    INTERNATIONAL JOURNAL OF WAVELETS MULTIRESOLUTION AND INFORMATION PROCESSING, 2013, 11 (03)
  • [17] Approximating Algorithm of Wavelet Neural Networks with Self-adaptive Learning Rate
    Gan Xusheng
    Duanmu Jingshu
    Wang Qing
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY, 2008, : 968 - 972
  • [18] A Diffferential Adaptive Learning Rate Method for Back-Propagation Neural Networks
    Iranmanesh, Saeid
    NN'09: PROCEEDINGS OF THE 10TH WSEAS INTERNATIONAL CONFERENCE ON NEURAL NETWORKS, 2009, : 30 - 34
  • [19] An analytical approach to hardware-friendly adaptive learning rate neural networks
    Rezaie, MG
    Farbiz, F
    Fakhraie, SM
    16TH INTERNATIONAL CONFERENCE ON MICROELECTRONICS, PROCEEDINGS, 2004, : 320 - 323
  • [20] Appropriate Learning Rates of Adaptive Learning Rate Optimization Algorithms for Training Deep Neural Networks
    Iiduka, Hideaki
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (12) : 13250 - 13261