Does Momentum Change the Implicit Regularization on Separable Data?

被引：0

作者：

Wang, Bohan ^{[1
]}

Meng, Qi ^{[2
]}

Zhang, Huishuai ^{[2
]}

Sun, Ruoyu ^{[3
]}

Chen, Wei ^{[4
]}

Ma, Zhi-Ming ^{[4
]}

Liu, Tie-Yan ^{[2
]}

机构：

[1] Univ Sci & Technol China, Hefei, Anhui, Peoples R China

[2] Microsoft Res Asia, Beijing, Peoples R China

[3] Chinese Univ Hong Kong, Shenzhen, Peoples R China

[4] Chinese Acad Sci, Beijing, Peoples R China

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022) | 2022年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The momentum acceleration technique is widely adopted in many optimization algorithms. However, there is no theoretical answer on how the momentum affects the generalization performance of the optimization algorithms. This paper studies this problem by analyzing the implicit regularization of momentum-based optimization. We prove that on the linear classification problem with separable data and exponential-tailed loss, gradient descent with momentum (GDM) converges to the L-2 max-margin solution, which is the same as vanilla gradient descent. That means gradient descent with momentum acceleration still converges to a low-complexity model, which guarantees their generalization. We then analyze the stochastic and adaptive variants of GDM (i.e., SGDM and deterministic Adam) and show they also converge to the L-2 max-margin solution. Technically, the implicit regularization of SGDM is established based on a novel convergence analysis of SGDM under a general noise condition called affine noise variance condition. To the best of our knowledge, we are the first to derive SGDM's convergence under such an assumption. Numerical experiments are conducted to support our theoretical results.

引用

页数：13

共 50 条

[1] The Implicit Regularization of Momentum Gradient Descent in Overparametrized Models
Wang, Li
Fu, Zhiguo
Zhou, Yingcong
Yan, Zili
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 8, 2023, : 10149 - 10156
[2] The Implicit Bias of AdaGrad on Separable Data
Qian, Qian
Qian, Xiaoyuan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[3] The Implicit Bias of Gradient Descent on Separable Data
Soudry, Daniel
Hoffer, Elad
Nacson, Mor Shpigel
Gunasekar, Suriya
Srebro, Nathan
JOURNAL OF MACHINE LEARNING RESEARCH, 2018, 19
[4] The implicit bias of gradient descent on separable data
Soudry, Daniel
Hoffer, Elad
Nacson, Mor Shpigel
Gunasekar, Suriya
Srebro, Nathan
Journal of Machine Learning Research, 2018, 19
[5] Implicit Regularization and Momentum Algorithms in Nonlinearly Parameterized Adaptive Control and Prediction
Boffi, Nicholas M.
Slotine, Jean-Jacques E.
NEURAL COMPUTATION, 2021, 33 (03) : 590 - 673
[6] One-loop conformal anomaly in an implicit momentum space regularization framework
Vieira, A. R.
Felipe, J. C. C.
Gazzola, G.
Sampaio, Marcos
EUROPEAN PHYSICAL JOURNAL C, 2015, 75 (07):
[7] One-loop conformal anomaly in an implicit momentum space regularization framework
A. R. Vieira
J. C. C. Felipe
G. Gazzola
Marcos Sampaio
The European Physical Journal C, 2015, 75
[8] Implicit Regularization of Dropout
Zhang, Zhongwang
Xu, Zhi-Qin John
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (06) : 4206 - 4217
[9] On Implicit Regularization in β-VAEs
Kumar, Abhishek
Poole, Ben
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
[10] Separable implicit certificate revocation
Yum, DH
Lee, PJ
INFORMATION SECURITY AND CRYPTOLOGY - ICISC 2004, 2004, 3506 : 121 - 136

← 1 2 3 4 5 →