On the training dynamics of deep networks with L2 regularization

被引:0
|
作者
Lewkowycz, Aitor [1 ]
Gur-Ari, Guy [1 ]
机构
[1] Google, Mountain View, CA 94043 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study the role of L-2 regularization in deep learning, and uncover simple relations between the performance of the model, the L-2 coefficient, the learning rate, and the number of training steps. These empirical relations hold when the network is overparameterized. They can be used to predict the optimal regularization parameter of a given model. In addition, based on these observations we propose a dynamical schedule for the regularization parameter that improves performance and speeds up training. We test these proposals in modern image classification settings. Finally, we show that these empirical relations can be understood theoretically in the context of infinitely wide networks. We derive the gradient flow dynamics of such networks, and compare the role of L-2 regularization in this context with that of linear models.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Word training and teaching Spanish LE/L2
    Sanchez Martin, Francisco Javier
    REVISTA DE INVESTIGACION LINGUISTICA, 2019, 22 : 513 - 516
  • [42] Influence of musical training on perception of L2 speech
    Sadakata, Makiko
    van der Zanden, Lotte
    Sekiyama, Kaoru
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 118 - +
  • [43] Memorizing articulatory habits in L2 with voice training
    Knoerr, Helene
    RECHERCHE ET PRATIQUES PEDAGOGIQUES EN LANGUES DE SPECIALITE-CAHIERS DE L APLIUT, 2006, 25 (02): : 88 - 111
  • [44] Image Reconstruction in Ultrasonic Transmission Tomography Using L1/L2 Regularization
    Li, Aoyu
    Liang, Guanghui
    Dong, Feng
    2024 IEEE INTERNATIONAL INSTRUMENTATION AND MEASUREMENT TECHNOLOGY CONFERENCE, I2MTC 2024, 2024,
  • [45] Sentiment Analysis of Tweets by Convolution Neural Network with L1 and L2 Regularization
    Rangra, Abhilasha
    Sehgal, Vivek Kumar
    Shukla, Shailendra
    ADVANCED INFORMATICS FOR COMPUTING RESEARCH, ICAICR 2018, PT I, 2019, 955 : 355 - 365
  • [46] Euclid in a Taxicab: Sparse Blind Deconvolution with Smoothed l1/l2 Regularization
    Repetti, Audrey
    Mai Quyen Pham
    Duval, Laurent
    Chouzenoux, Emilie
    Pesquet, Jean-Christophe
    IEEE SIGNAL PROCESSING LETTERS, 2015, 22 (05) : 539 - 543
  • [47] αl1-βl2 sparsity regularization for nonlinear ill-posed problems
    Ding, Liang
    Han, Weimin
    JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS, 2024, 450
  • [48] Mixed l2 and l1-norm regularization for adaptive detrending with ARMA modeling
    Giarre, L.
    Argenti, F.
    JOURNAL OF THE FRANKLIN INSTITUTE-ENGINEERING AND APPLIED MATHEMATICS, 2018, 355 (03): : 1493 - 1511
  • [49] On l2 data fitting and modified nonconvex nonsmooth regularization for image recovery
    Xiao, Jin
    Yang, Yu-Fei
    Yuan, Xiao
    JOURNAL OF COMPUTATIONAL ANALYSIS AND APPLICATIONS, 2013, 15 (02) : 264 - 279
  • [50] Training Compact DNNs with l1/2 Regularization
    Tang, Anda
    Niu, Lingfeng
    Miao, Jianyu
    Zhang, Peng
    PATTERN RECOGNITION, 2023, 136