Penalizing Gradient Norm for Efficiently Improving Generalization in Deep Learning

被引:0
|
作者
Zhao, Yang [1 ]
Zhang, Hao [1 ]
Hu, Xiuyuan [1 ]
机构
[1] Tsinghua Univ, Dept Elect Engn, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
How to train deep neural networks (DNNs) to generalize well is a central concern in deep learning, especially for severely overparameterized networks nowadays. In this paper, we propose an effective method to improve the model generalization by additionally penalizing the gradient norm of loss function during optimization. We demonstrate that confining the gradient norm of loss function could help lead the optimizers towards finding flat minima. We leverage the first-order approximation to efficiently implement the corresponding gradient to fit well in the gradient descent framework. In our experiments, we confirm that when using our methods, generalization performance of various models could be improved on different datasets. Also, we show that the recent sharpness-aware minimization method (Foret et al., 2021) is a special, but not the best, case of our method, where the best case of our method could give new state-of-art performance on these tasks. Code is available at https://github.com/zhaoyang-0204/gnp.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] FuCiTNet: Improving the generalization of deep learning networks by the fusion of learned class-inherent transformations
    Rey-Area, Manuel
    Guirado, Emilio
    Tabik, Siham
    Ruiz-Hidalgo, Javier
    INFORMATION FUSION, 2020, 63 : 188 - 195
  • [22] Improving the generalization of deep learning methods to segment the left ventricle in short axis MR images
    Graves, Catharine V.
    Moreno, Ramon A.
    Rebelo, Marina S.
    Nomura, Cesar H.
    Gutierrez, Marco A.
    42ND ANNUAL INTERNATIONAL CONFERENCES OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY: ENABLING INNOVATIVE TECHNOLOGIES FOR GLOBAL HEALTHCARE EMBC'20, 2020, : 1203 - 1206
  • [23] Precipitation Nowcasting with Orographic Enhanced Stacked Generalization: Improving Deep Learning Predictions on Extreme Events
    Franch, Gabriele
    Nerini, Daniele
    Pendesini, Marta
    Coviello, Luca
    Jurman, Giuseppe
    Furlanello, Cesare
    ATMOSPHERE, 2020, 11 (03)
  • [24] Efficiently Trained Deep Learning Potential for Graphane
    Achar, Siddarth K.
    Zhang, Linfeng
    Johnson, J. Karl
    JOURNAL OF PHYSICAL CHEMISTRY C, 2021, 125 (27): : 14874 - 14882
  • [25] Improving Generalization in Reinforcement Learning with Mixture Regularization
    Wang, Kaixin
    Kang, Bingyi
    Shao, Jie
    Feng, Jiashi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [26] Improving generalization by teacher-directed learning
    Kamimura, R
    PROCEEDINGS OF THE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS 2003, VOLS 1-4, 2003, : 3042 - 3047
  • [27] Improving generalization ability through active learning
    Vijayakumar, S
    Ogawa, H
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 1999, E82D (02) : 480 - 487
  • [28] Improving sequential decisions – Efficiently accounting for future learning
    Wang, Lingya
    Oliver, Dean S.
    Wang, Lingya (liwa@norceresearch.no), 1600, Elsevier B.V. (205):
  • [29] Improving sequential decisions - Efficiently accounting for future learning
    Wang, Lingya
    Oliver, Dean S.
    JOURNAL OF PETROLEUM SCIENCE AND ENGINEERING, 2021, 205
  • [30] Loss Function Learning for Domain Generalization by Implicit Gradient
    Gao, Boyan
    Gouk, Henry
    Yang, Yongxin
    Hospedales, Timothy
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,