On the training dynamics of deep networks with L2 regularization

被引:0
|
作者
Lewkowycz, Aitor [1 ]
Gur-Ari, Guy [1 ]
机构
[1] Google, Mountain View, CA 94043 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study the role of L-2 regularization in deep learning, and uncover simple relations between the performance of the model, the L-2 coefficient, the learning rate, and the number of training steps. These empirical relations hold when the network is overparameterized. They can be used to predict the optimal regularization parameter of a given model. In addition, based on these observations we propose a dynamical schedule for the regularization parameter that improves performance and speeds up training. We test these proposals in modern image classification settings. Finally, we show that these empirical relations can be understood theoretically in the context of infinitely wide networks. We derive the gradient flow dynamics of such networks, and compare the role of L-2 regularization in this context with that of linear models.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] Network as Regularization for Training Deep Neural Networks: Framework, Model and Performance
    Tian, Kai
    Xu, Yi
    Guan, Jihong
    Zhou, Shuigeng
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 6013 - 6020
  • [22] A projected gradient method for αl1-βl2 sparsity regularization
    Ding, Liang
    Han, Weimin
    INVERSE PROBLEMS, 2020, 36 (12)
  • [23] An l2/l1 regularization framework for diverse learning tasks
    Wang, Shengzheng
    Peng, Jing
    Liu, Wei
    SIGNAL PROCESSING, 2015, 109 : 206 - 211
  • [24] APPROXIMATION OF L2 ELEMENT USING GENERALIZED RATIONAL FRACTION WITH RESPECT TO STANDARD L2 AND REGULARIZATION OF APPROXIMATION SET
    WOLF, J
    COMPTES RENDUS HEBDOMADAIRES DES SEANCES DE L ACADEMIE DES SCIENCES SERIE A, 1974, 278 (17): : 1111 - 1113
  • [25] Lexical Stress Detection for L2 English Speech Using Deep Belief Networks
    Li, Kun
    Qian, Xiaojun
    Kang, Shiyin
    Meng, Helen
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1810 - 1814
  • [26] Group analysis of fMRI data using L1 and L2 regularization
    Overholser, Rosanna
    Xu, Ronghui
    STATISTICS AND ITS INTERFACE, 2015, 8 (03) : 379 - 390
  • [27] A Hidden Feature Selection Method based on l2,0-Norm Regularization for Training Single-hidden-layer Neural Networks
    Liu, Zhiwei
    Yu, Yuanlong
    Sun, Zhenzhen
    2019 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2019), 2019, : 1810 - 1817
  • [28] Combined l2 data and gradient fitting in conjunction with l1 regularization
    Didas, Stephan
    Setzer, Simon
    Steidl, Gabriele
    ADVANCES IN COMPUTATIONAL MATHEMATICS, 2009, 30 (01) : 79 - 99
  • [29] Sparse portfolio optimization via l1 over l2 regularization
    Wu, Zhongming
    Sun, Kexin
    Ge, Zhili
    Allen-Zhao, Zhihua
    Zeng, Tieyong
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2024, 319 (03) : 820 - 833
  • [30] Morozov's Discrepancy Principle For αl1 - βl2 Sparsity Regularization
    Ding, Liang
    Han, Weimin
    INVERSE PROBLEMS AND IMAGING, 2023, 17 (01) : 157 - 179