Penalizing Gradient Norm for Efficiently Improving Generalization in Deep Learning

被引:0
|
作者
Zhao, Yang [1 ]
Zhang, Hao [1 ]
Hu, Xiuyuan [1 ]
机构
[1] Tsinghua Univ, Dept Elect Engn, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
How to train deep neural networks (DNNs) to generalize well is a central concern in deep learning, especially for severely overparameterized networks nowadays. In this paper, we propose an effective method to improve the model generalization by additionally penalizing the gradient norm of loss function during optimization. We demonstrate that confining the gradient norm of loss function could help lead the optimizers towards finding flat minima. We leverage the first-order approximation to efficiently implement the corresponding gradient to fit well in the gradient descent framework. In our experiments, we confirm that when using our methods, generalization performance of various models could be improved on different datasets. Also, we show that the recent sharpness-aware minimization method (Foret et al., 2021) is a special, but not the best, case of our method, where the best case of our method could give new state-of-art performance on these tasks. Code is available at https://github.com/zhaoyang-0204/gnp.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Learning Gradient Descent: Better Generalization and Longer Horizons
    Lv, Kaifeng
    Jiang, Shunhua
    Li, Jian
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [32] Gradient Norm Aware Minimization Seeks First-Order Flatness and Improves Generalization
    Zhang, Xingxuan
    Xu, Renzhe
    Yu, Han
    Zou, Hao
    Cui, Peng
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 20247 - 20257
  • [33] Empirical Analysis of Generalization and Learning in XCS with Gradient Descent
    Lanzi, Pier Luca
    Butz, Martin V.
    Goldberg, David E.
    GECCO 2007: GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, VOL 1 AND 2, 2007, : 1814 - +
  • [34] A Deep Learning Approach for Norm Conflict Identification
    Aires, Joao Paulo
    Meneguzzi, Felipe
    AAMAS'17: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2017, : 1451 - 1453
  • [35] PathoWAve: A Deep Learning-based Weight Averaging Method for Improving Domain Generalization in Histopathology Images
    Sharifi, Parastoo Sotoudeh
    Ahmad, M. Omair
    Swamy, M. N. S.
    2024 IEEE 67TH INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS, MWSCAS 2024, 2024, : 975 - 979
  • [36] AFSE: towards improving model generalization of deep graph learning of ligand bioactivities targeting GPCR proteins
    Yin, Yueming
    Hu, Haifeng
    Yang, Zhen
    Jiang, Feihu
    Huang, Yihe
    Wu, Jiansheng
    BRIEFINGS IN BIOINFORMATICS, 2022, 23 (03)
  • [37] Methods for Improving Deep Learning-Based Cardiac Auscultation Accuracy: Data Augmentation and Data Generalization
    Jeong, Yoojin
    Kim, Juhee
    Kim, Daeyeol
    Kim, Jinsoo
    Lee, Kwangkee
    APPLIED SCIENCES-BASEL, 2021, 11 (10):
  • [38] Limitations of the NTK for Understanding Generalization in Deep Learning
    Vyas, Nikhil
    Bansal, Yamini
    Nakkiran, Preetum
    arXiv, 2022,
  • [39] An Optimal Transport Analysis on Generalization in Deep Learning
    Zhang, Jingwei
    Liu, Tongliang
    Tao, Dacheng
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (06) : 2842 - 2853
  • [40] Understanding Surprising Generalization Phenomena in Deep Learning
    Hu, Wei
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 20, 2024, : 22669 - 22669