On the training dynamics of deep networks with L2 regularization

被引：0

作者：

Lewkowycz, Aitor ^{[1
]}

Gur-Ari, Guy ^{[1
]}

机构：

[1] Google, Mountain View, CA 94043 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020 | 2020年 / 33卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We study the role of L-2 regularization in deep learning, and uncover simple relations between the performance of the model, the L-2 coefficient, the learning rate, and the number of training steps. These empirical relations hold when the network is overparameterized. They can be used to predict the optimal regularization parameter of a given model. In addition, based on these observations we propose a dynamical schedule for the regularization parameter that improves performance and speeds up training. We test these proposals in modern image classification settings. Finally, we show that these empirical relations can be understood theoretically in the context of infinitely wide networks. We derive the gradient flow dynamics of such networks, and compare the role of L-2 regularization in this context with that of linear models.

引用

页数：10

共 50 条

[41] Word training and teaching Spanish LE/L2
Sanchez Martin, Francisco Javier
REVISTA DE INVESTIGACION LINGUISTICA, 2019, 22 : 513 - 516
[42] Influence of musical training on perception of L2 speech
Sadakata, Makiko
van der Zanden, Lotte
Sekiyama, Kaoru
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 118 - +
[43] Memorizing articulatory habits in L2 with voice training
Knoerr, Helene
RECHERCHE ET PRATIQUES PEDAGOGIQUES EN LANGUES DE SPECIALITE-CAHIERS DE L APLIUT, 2006, 25 (02): : 88 - 111
[44] Image Reconstruction in Ultrasonic Transmission Tomography Using L1/L2 Regularization
Li, Aoyu
Liang, Guanghui
Dong, Feng
2024 IEEE INTERNATIONAL INSTRUMENTATION AND MEASUREMENT TECHNOLOGY CONFERENCE, I2MTC 2024, 2024,
[45] Sentiment Analysis of Tweets by Convolution Neural Network with L1 and L2 Regularization
Rangra, Abhilasha
Sehgal, Vivek Kumar
Shukla, Shailendra
ADVANCED INFORMATICS FOR COMPUTING RESEARCH, ICAICR 2018, PT I, 2019, 955 : 355 - 365
[46] Euclid in a Taxicab: Sparse Blind Deconvolution with Smoothed l1/l2 Regularization
Repetti, Audrey
Mai Quyen Pham
Duval, Laurent
Chouzenoux, Emilie
Pesquet, Jean-Christophe
IEEE SIGNAL PROCESSING LETTERS, 2015, 22 (05) : 539 - 543
[47] αl1-βl2 sparsity regularization for nonlinear ill-posed problems
Ding, Liang
Han, Weimin
JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS, 2024, 450
[48] Mixed l2 and l1-norm regularization for adaptive detrending with ARMA modeling
Giarre, L.
Argenti, F.
JOURNAL OF THE FRANKLIN INSTITUTE-ENGINEERING AND APPLIED MATHEMATICS, 2018, 355 (03): : 1493 - 1511
[49] On l2 data fitting and modified nonconvex nonsmooth regularization for image recovery
Xiao, Jin
Yang, Yu-Fei
Yuan, Xiao
JOURNAL OF COMPUTATIONAL ANALYSIS AND APPLICATIONS, 2013, 15 (02) : 264 - 279
[50] Training Compact DNNs with l1/2 Regularization
Tang, Anda
Niu, Lingfeng
Miao, Jianyu
Zhang, Peng
PATTERN RECOGNITION, 2023, 136

← 1 2 3 4 5 →