Penalizing Gradient Norm for Efficiently Improving Generalization in Deep Learning

被引:0
|
作者
Zhao, Yang [1 ]
Zhang, Hao [1 ]
Hu, Xiuyuan [1 ]
机构
[1] Tsinghua Univ, Dept Elect Engn, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
How to train deep neural networks (DNNs) to generalize well is a central concern in deep learning, especially for severely overparameterized networks nowadays. In this paper, we propose an effective method to improve the model generalization by additionally penalizing the gradient norm of loss function during optimization. We demonstrate that confining the gradient norm of loss function could help lead the optimizers towards finding flat minima. We leverage the first-order approximation to efficiently implement the corresponding gradient to fit well in the gradient descent framework. In our experiments, we confirm that when using our methods, generalization performance of various models could be improved on different datasets. Also, we show that the recent sharpness-aware minimization method (Foret et al., 2021) is a special, but not the best, case of our method, where the best case of our method could give new state-of-art performance on these tasks. Code is available at https://github.com/zhaoyang-0204/gnp.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Improving generalization for geometric variations in images for efficient deep learning
    Grover, Shivam
    Sidana, Kshitij
    Jain, Vanita
    Jain, Rachna
    Nayyar, Anand
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (23) : 63169 - 63191
  • [2] Improving generalization performance of natural gradient learning using optimized regularization by NIC
    Park, H
    Murata, N
    Amari, S
    NEURAL COMPUTATION, 2004, 16 (02) : 355 - 382
  • [3] Improving Generalization of Deep Reinforcement Learning-based TSP Solvers
    Ouyang, Wenbin
    Wang, Yisen
    Han, Shaochen
    Jin, Zhejian
    Weng, Paul
    2021 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2021), 2021,
  • [4] Natural gradient works efficiently in learning
    Amari, S
    NEURAL COMPUTATION, 1998, 10 (02) : 251 - 276
  • [5] Natural gradient works efficiently in learning
    Amari, S
    KNOWLEDGE-BASED INTELLIGENT INFORMATION ENGINEERING SYSTEMS & ALLIED TECHNOLOGIES, PTS 1 AND 2, 2001, 69 : 11 - 14
  • [6] A Diversity-Penalizing Ensemble Training Method for Deep Learning
    Zhang, Xiaohui
    Povey, Daniel
    Khudanpur, Sanjeev
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3590 - 3594
  • [7] IMPROVING GENERALIZATION WITH ACTIVE LEARNING
    COHN, D
    ATLAS, L
    LADNER, R
    MACHINE LEARNING, 1994, 15 (02) : 201 - 221
  • [8] Improving the generalization performance of deep networks by dual pattern learning with adversarial adaptation
    Zhang, Haimin
    Xu, Min
    KNOWLEDGE-BASED SYSTEMS, 2020, 200
  • [9] Exploring Generalization in Deep Learning
    Neyshabur, Behnam
    Bhojanapalli, Srinadh
    McAllester, David
    Srebro, Nathan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [10] Verifying Generalization in Deep Learning
    Amir, Guy
    Maayan, Osher
    Zelazny, Tom
    Katz, Guy
    Schapira, Michael
    COMPUTER AIDED VERIFICATION, CAV 2023, PT II, 2023, 13965 : 438 - 455