A SCALE INVARIANT MEASURE OF FLATNESS FOR DEEP NETWORK MINIMA

被引：3

作者：

Rangamani, Akshay ^{[1
]}

Nguyen, Nam H. ^{[2
]}

Kumar, Abhishek ^{[3
]}

Dzung Phan ^{[2
]}

Chin, Sang ^{[4
]}

Tran, Trac D. ^{[5
]}

机构：

[1] MIT, Ctr Brains Minds & Machines, Cambridge, MA 02139 USA

[2] IBM Res, Armonk, NY USA

[3] Google Brain, Mountain View, CA USA

[4] Boston Univ, CS Dept, Boston, MA 02215 USA

[5] Johns Hopkins Univ, ECE Dept, Baltimore, MD 21218 USA

来源：

2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年

关键词：

Deep Learning; Generalization; Flat Minima; Riemannian Quotient Manifolds;

D O I：

10.1109/ICASSP39728.2021.9413771

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

It has been empirically observed that the flatness of minima obtained from training deep networks seems to correlate with better generalization. However, for deep networks with positively homogeneous activations, most measures of flatness are not invariant to rescaling of the network parameters. This means that the measure of flatness can be made as small or as large as possible through rescaling, rendering the quantitative measures meaningless. In this paper we show that for deep networks with positively homogenous activations, these rescalings constitute equivalence relations, and that these equivalence relations induce a quotient manifold structure in the parameter space. Using an appropriate Riemannian metric, we propose a Hessian-based measure for flatness that is invariant to rescaling and perform simulations to empirically verify our claim. Finally we perform experiments to verify that our flatness measure correlates with generalization by using minibatch stochastic gradient descent with different batch sizes to find deep network minima with different generalization properties.

引用

页码：1680 / 1684

页数：5

共 50 条

[21] The scale -invariant space for attention layer in neural network
Wang, Yue
Liu, Yuting
Ma, Zhi-Ming
NEUROCOMPUTING, 2020, 392 : 1 - 10
[22] DeepParticle: Learning invariant measure by a deep neural network minimizing Wasserstein distance on data generated from an interacting particle method
Wang, Zhongjian
Xin, Jack
Zhang, Zhiwen
JOURNAL OF COMPUTATIONAL PHYSICS, 2022, 464
[23] Deep minima in stellar dynamos
Brooke, J
Moss, D
Phillips, A
ASTRONOMY & ASTROPHYSICS, 2002, 395 (03): : 1013 - 1022
[24] A Scale-Invariant Framework For Image Classification With Deep Learning
Jiang, Yalong
Chi, Zheru
2017 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2017, : 1019 - 1024
[25] Accelerating the Network for Deep Learning at Scale
Klenk, Benjamin
2020 14TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON NETWORKS-ON-CHIP (NOCS), 2020,
[26] Learning scale-variant and scale-invariant features for deep image classification
van Noord, Nanne
Postma, Eric
PATTERN RECOGNITION, 2017, 61 : 583 - 592
[27] ABSOLUTE MINIMA OF A SO(10) INVARIANT HIGGS POTENTIAL
KAYMAKCALAN, O
MICHEL, L
WALI, KC
MCGLINN, WD
ORAIFEARTAIGH, L
NUCLEAR PHYSICS B, 1986, 267 (01) : 203 - 230
[28] SimUSF: an efficient and effective similarity measure that is invariant to violations of the interval scale assumption
Thilak L. Fernando
Geoffrey I. Webb
Data Mining and Knowledge Discovery, 2017, 31 : 264 - 286
[29] SimUSF: an efficient and effective similarity measure that is invariant to violations of the interval scale assumption
Fernando, Thilak L.
Webb, Geoffrey I.
DATA MINING AND KNOWLEDGE DISCOVERY, 2017, 31 (01) : 264 - 286
[30] Spanish validation of the cardiac self-efficacy scale: a gender invariant measure
Arenas, Alicia
Cuadrado, Esther
Castillo-Mayen, Rosario
Luque, Barbara
Rubio, Sebastian
Gutierrez-Domingo, Tamara
Tabernero, Carmen
PSYCHOLOGY HEALTH & MEDICINE, 2024, 29 (02) : 334 - 349

← 1 2 3 4 5 →