A SCALE INVARIANT MEASURE OF FLATNESS FOR DEEP NETWORK MINIMA

被引:3
|
作者
Rangamani, Akshay [1 ]
Nguyen, Nam H. [2 ]
Kumar, Abhishek [3 ]
Dzung Phan [2 ]
Chin, Sang [4 ]
Tran, Trac D. [5 ]
机构
[1] MIT, Ctr Brains Minds & Machines, Cambridge, MA 02139 USA
[2] IBM Res, Armonk, NY USA
[3] Google Brain, Mountain View, CA USA
[4] Boston Univ, CS Dept, Boston, MA 02215 USA
[5] Johns Hopkins Univ, ECE Dept, Baltimore, MD 21218 USA
来源
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年
关键词
Deep Learning; Generalization; Flat Minima; Riemannian Quotient Manifolds;
D O I
10.1109/ICASSP39728.2021.9413771
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
It has been empirically observed that the flatness of minima obtained from training deep networks seems to correlate with better generalization. However, for deep networks with positively homogeneous activations, most measures of flatness are not invariant to rescaling of the network parameters. This means that the measure of flatness can be made as small or as large as possible through rescaling, rendering the quantitative measures meaningless. In this paper we show that for deep networks with positively homogenous activations, these rescalings constitute equivalence relations, and that these equivalence relations induce a quotient manifold structure in the parameter space. Using an appropriate Riemannian metric, we propose a Hessian-based measure for flatness that is invariant to rescaling and perform simulations to empirically verify our claim. Finally we perform experiments to verify that our flatness measure correlates with generalization by using minibatch stochastic gradient descent with different batch sizes to find deep network minima with different generalization properties.
引用
收藏
页码:1680 / 1684
页数:5
相关论文
共 50 条
  • [21] The scale -invariant space for attention layer in neural network
    Wang, Yue
    Liu, Yuting
    Ma, Zhi-Ming
    NEUROCOMPUTING, 2020, 392 : 1 - 10
  • [22] DeepParticle: Learning invariant measure by a deep neural network minimizing Wasserstein distance on data generated from an interacting particle method
    Wang, Zhongjian
    Xin, Jack
    Zhang, Zhiwen
    JOURNAL OF COMPUTATIONAL PHYSICS, 2022, 464
  • [23] Deep minima in stellar dynamos
    Brooke, J
    Moss, D
    Phillips, A
    ASTRONOMY & ASTROPHYSICS, 2002, 395 (03): : 1013 - 1022
  • [24] A Scale-Invariant Framework For Image Classification With Deep Learning
    Jiang, Yalong
    Chi, Zheru
    2017 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2017, : 1019 - 1024
  • [25] Accelerating the Network for Deep Learning at Scale
    Klenk, Benjamin
    2020 14TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON NETWORKS-ON-CHIP (NOCS), 2020,
  • [26] Learning scale-variant and scale-invariant features for deep image classification
    van Noord, Nanne
    Postma, Eric
    PATTERN RECOGNITION, 2017, 61 : 583 - 592
  • [27] ABSOLUTE MINIMA OF A SO(10) INVARIANT HIGGS POTENTIAL
    KAYMAKCALAN, O
    MICHEL, L
    WALI, KC
    MCGLINN, WD
    ORAIFEARTAIGH, L
    NUCLEAR PHYSICS B, 1986, 267 (01) : 203 - 230
  • [28] SimUSF: an efficient and effective similarity measure that is invariant to violations of the interval scale assumption
    Thilak L. Fernando
    Geoffrey I. Webb
    Data Mining and Knowledge Discovery, 2017, 31 : 264 - 286
  • [29] SimUSF: an efficient and effective similarity measure that is invariant to violations of the interval scale assumption
    Fernando, Thilak L.
    Webb, Geoffrey I.
    DATA MINING AND KNOWLEDGE DISCOVERY, 2017, 31 (01) : 264 - 286
  • [30] Spanish validation of the cardiac self-efficacy scale: a gender invariant measure
    Arenas, Alicia
    Cuadrado, Esther
    Castillo-Mayen, Rosario
    Luque, Barbara
    Rubio, Sebastian
    Gutierrez-Domingo, Tamara
    Tabernero, Carmen
    PSYCHOLOGY HEALTH & MEDICINE, 2024, 29 (02) : 334 - 349