Cyclical Learning Rates for Training Neural Networks

被引：1498

作者：

Smith, Leslie N. ^{[1
]}

机构：

[1] US Naval, Res Lab, Code 5514 4555 Overlook Ave SW, Washington, DC 20375 USA

来源：

2017 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2017) | 2017年

关键词：

D O I：

10.1109/WACV.2017.58

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

It is known that the learning rate is the most important hyper-parameter to tune for training deep neural networks. This paper describes a new method for setting the learning rate, named cyclical learning rates, which practically eliminates the need to experimentally find the best values and schedule for the global learning rates. Instead of monotonically decreasing the learning rate, this method lets the learning rate cyclically vary between reasonable boundary values. Training with cyclical learning rates instead of fixed values achieves improved classification accuracy without a need to tune and often in fewer iterations. This paper also describes a simple way to estimate "reasonable bounds" - linearly increasing the learning rate of the network for a few epochs. In addition, cyclical learning rates are demonstrated on the CIFAR-10 and CIFAR-100 datasets with ResNets, Stochastic Depth networks, and DenseNets, and the ImageNet dataset with the AlexNet and GoogLeNet architectures. These are practical tools for everyone who trains neural networks.

引用

页码：464 / 472

页数：9

共 50 条

[1] LLR: Learning learning rates by LSTM for training neural networks
Yu, Changyong
Qi, Xin
Ma, Haitao
He, Xin
Wang, Cuirong
Zhao, Yuhai
[J]. NEUROCOMPUTING, 2020, 394 : 41 - 50
[2] Rates of learning in gradient and genetic training of recurrent neural networks
Riaza, R
Zufiria, PJ
[J]. ARTIFICIAL NEURAL NETS AND GENETIC ALGORITHMS, 1999, : 95 - 99
[3] RALR: Random Amplify Learning Rates for Training Neural Networks
Deng, Jiali
Gong, Haigang
Liu, Minghui
Xie, Tianshu
Cheng, Xuan
Wang, Xiaomin
Liu, Ming
[J]. APPLIED SCIENCES-BASEL, 2022, 12 (01):
[4] Appropriate Learning Rates of Adaptive Learning Rate Optimization Algorithms for Training Deep Neural Networks
Iiduka, Hideaki
[J]. IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (12) : 13250 - 13261
[5] Multilingual Deep Neural Network Training using Cyclical Learning Rate
Kirkedal, Andreas
Kim, Yeon-Jun
[J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2933 - 2937
[6] The adaptive learning rates of extended Kalman filter based training algorithm for wavelet neural networks
Kim, Kyoung Joo
Park, Jin Bae
Choi, Yoon Ho
[J]. MICAI 2006: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2006, 4293 : 327 - +
[7] Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates
Smith, Leslie N.
Topin, Nicholay
[J]. ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FOR MULTI-DOMAIN OPERATIONS APPLICATIONS, 2019, 11006
[8] Cyclical Pruning for Sparse Neural Networks
Srinivas, Suraj
Kuzmin, Andrey
Nagel, Markus
van Baalen, Mart
Skliar, Andrii
Blankevoort, Tijmen
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 2761 - 2770
[9] Neural Structured Learning: Training Neural Networks with Structured Signals
Gopalan, Arjun
Juan, Da-Cheng
Magalhaes, Cesar Ilharco
Ferng, Chun-Sung
Heydon, Allan
Lu, Chun-Ta
Pham, Philip
Yu, George
[J]. KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 3501 - 3502
[10] Neural Graph Learning: Training Neural Networks Using Graphs
Bui, Thang D.
Ravi, Sujith
Ramavajjala, Vivek
[J]. WSDM'18: PROCEEDINGS OF THE ELEVENTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2018, : 64 - 71

← 1 2 3 4 5 →