On the training dynamics of deep networks with L2 regularization

被引:0
|
作者
Lewkowycz, Aitor [1 ]
Gur-Ari, Guy [1 ]
机构
[1] Google, Mountain View, CA 94043 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study the role of L-2 regularization in deep learning, and uncover simple relations between the performance of the model, the L-2 coefficient, the learning rate, and the number of training steps. These empirical relations hold when the network is overparameterized. They can be used to predict the optimal regularization parameter of a given model. In addition, based on these observations we propose a dynamical schedule for the regularization parameter that improves performance and speeds up training. We test these proposals in modern image classification settings. Finally, we show that these empirical relations can be understood theoretically in the context of infinitely wide networks. We derive the gradient flow dynamics of such networks, and compare the role of L-2 regularization in this context with that of linear models.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Near-optimal Parameter Selection Methods for l2 Regularization
    Ballal, Tarig
    Suliman, Mohamed
    Al-Naffouri, Tareq Y.
    2017 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP 2017), 2017, : 1295 - 1299
  • [32] Blind Image Restoration Based on l1 - l2 Blur Regularization
    Xiao, Su
    ENGINEERING LETTERS, 2020, 28 (01) : 148 - 154
  • [33] An Improved Variable Kernel Density Estimator Based on L2 Regularization
    Jin, Yi
    He, Yulin
    Huang, Defa
    MATHEMATICS, 2021, 9 (16)
  • [34] Weighted Multiview K-Means Clustering with L2 Regularization
    Hussain, Ishtiaq
    Nataliani, Yessica
    Ali, Mehboob
    Hussain, Atif
    Mujlid, Hana M.
    Almaliki, Faris A.
    Rahimi, Nouf M.
    SYMMETRY-BASEL, 2024, 16 (12):
  • [35] Deep regularization and direct training of the inner layers of Neural Networks with Kernel Flows
    Yoo, Gene Ryan
    Owhadi, Houman
    PHYSICA D-NONLINEAR PHENOMENA, 2021, 426
  • [36] Batch Normalization and Dropout Regularization in Training Deep Neural Networks with Label Noise
    Rusiecki, Andrzej
    INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, ISDA 2021, 2022, 418 : 57 - 66
  • [37] ISING-DROPOUT: A REGULARIZATION METHOD FOR TRAINING AND COMPRESSION OF DEEP NEURAL NETWORKS
    Salehinejad, Hojjat
    Valaee, Shahrokh
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 3602 - 3606
  • [38] Differential impacts of natural L2 immersion and intensive classroom L2 training on cognitive control
    Xie, Zhilong
    Antolovic, Katarina
    QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 2022, 75 (03): : 550 - 562
  • [39] Batch gradient method with smoothing L1/2 regularization for training of feedforward neural networks
    Wu, Wei
    Fan, Qinwei
    Zurada, Jacek M.
    Wang, Jian
    Yang, Dakun
    Liu, Yan
    NEURAL NETWORKS, 2014, 50 : 72 - 78
  • [40] Mispronunciation Detection and Diagnosis in L2 English Speech Using Multidistribution Deep Neural Networks
    Li, Kun
    Qian, Xiaojun
    Meng, Helen
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (01) : 193 - 207