Demystify Hyperparameters for Stochastic Optimization with Transferable Representations

被引：5

作者：

Sun, Jianhui ^{[1
]}

Huai, Mengdi ^{[1
]}

Jha, Kishlay ^{[1
]}

Zhang, Aidong ^{[1
]}

机构：

[1] Univ Virginia, Charlottesville, VA 22903 USA

来源：

PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022 | 2022年

基金：

美国国家科学基金会;

关键词：

Deep Learning Optimization; Fine-tuning; Stochastic Gradient Descent; AutoML; Generalization Bound;

D O I：

10.1145/3534678.3539298

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper studies the convergence and generalization of a large class of Stochastic Gradient Descent (SGD) momentum schemes, in both learning from scratch and transferring representations with fine-tuning. Momentum-based acceleration of SGD is the default optimizer for many deep learning models. However, there is a lack of general convergence guarantees for many existing momentum variants in conjunction with stochastic gradient. It is also unclear how the momentum methods may affect the generalization error. In this paper, we give a unified analysis of several popular optimizers, e.g., Polyak's heavy ball momentum and Nesterov's accelerated gradient. Our contribution is threefold. First, we give a unified convergence guarantee for a large class of momentum variants in the stochastic setting. Notably, our results cover both convex and nonconvex objectives. Second, we prove a generalization bound for neural networks trained by momentum variants. We analyze how hyperparameters affect the generalization bound and consequently propose guidelines on how to tune these hyperparameters in various momentum schemes to generalize well. We provide extensive empirical evidence to our proposed guidelines. Third, this study fills the vacancy of a formal analysis of fine-tuning in literature. To our best knowledge, our work is the first systematic generalizability analysis on momentum methods that cover both learning from scratch and fine-tuning. Our codes are available (1).

引用

页码：1706 / 1716

页数：11

共 50 条

[1] On the Hyperparameters in Stochastic Gradient Descent with Momentum
Shi, Bin
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
[2] Transferable Neural Projection Representations
Sankar, Chinnadhurai
Ravi, Sujith
Kozareva, Zornitsa
[J]. 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 3355 - 3360
[3] Optimization of hyperparameters for SMS reconstruction
Muftuler, L. Tugan
Arpinar, Volkan Emre
Koch, Kevin
Bhave, Sampada
Yang, Baolian
Kaushik, Sivaram
Banerjee, Suchandrima
Nencka, Andrew
[J]. MAGNETIC RESONANCE IMAGING, 2020, 73 : 91 - 103
[4] Learning Adversarially Fair and Transferable Representations
Madras, David
Creager, Elliot
Pitassi, Toniann
Zemel, Richard
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
[5] Optimization of Annealed Importance Sampling Hyperparameters
Goshtasbpour, Shirin
Perez-Cruz, Fernando
[J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT V, 2023, 13717 : 174 - 190
[6] Stochastic and Adversarial Online Learning without Hyperparameters
Cutkosky, Ashok
Boahen, Kwabena
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
[7] Theoretical Aspects in Penalty Hyperparameters Optimization
Flavia Esposito
Laura Selicato
Caterina Sportelli
[J]. Mediterranean Journal of Mathematics, 2023, 20 (6)
[8] Gradient-based optimization of hyperparameters
Bengio, Y
[J]. NEURAL COMPUTATION, 2000, 12 (08) : 1889 - 1900
[9] Theoretical Aspects in Penalty Hyperparameters Optimization
Esposito, Flavia
Selicato, Laura
Sportelli, Caterina
[J]. MEDITERRANEAN JOURNAL OF MATHEMATICS, 2023, 20 (06)
[10] Efficient gradient computation for optimization of hyperparameters
Xu, Jingyan
Noo, Frederic
[J]. PHYSICS IN MEDICINE AND BIOLOGY, 2022, 67 (03):

← 1 2 3 4 5 →