Demystify Hyperparameters for Stochastic Optimization with Transferable Representations

被引:5
|
作者
Sun, Jianhui [1 ]
Huai, Mengdi [1 ]
Jha, Kishlay [1 ]
Zhang, Aidong [1 ]
机构
[1] Univ Virginia, Charlottesville, VA 22903 USA
基金
美国国家科学基金会;
关键词
Deep Learning Optimization; Fine-tuning; Stochastic Gradient Descent; AutoML; Generalization Bound;
D O I
10.1145/3534678.3539298
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper studies the convergence and generalization of a large class of Stochastic Gradient Descent (SGD) momentum schemes, in both learning from scratch and transferring representations with fine-tuning. Momentum-based acceleration of SGD is the default optimizer for many deep learning models. However, there is a lack of general convergence guarantees for many existing momentum variants in conjunction with stochastic gradient. It is also unclear how the momentum methods may affect the generalization error. In this paper, we give a unified analysis of several popular optimizers, e.g., Polyak's heavy ball momentum and Nesterov's accelerated gradient. Our contribution is threefold. First, we give a unified convergence guarantee for a large class of momentum variants in the stochastic setting. Notably, our results cover both convex and nonconvex objectives. Second, we prove a generalization bound for neural networks trained by momentum variants. We analyze how hyperparameters affect the generalization bound and consequently propose guidelines on how to tune these hyperparameters in various momentum schemes to generalize well. We provide extensive empirical evidence to our proposed guidelines. Third, this study fills the vacancy of a formal analysis of fine-tuning in literature. To our best knowledge, our work is the first systematic generalizability analysis on momentum methods that cover both learning from scratch and fine-tuning. Our codes are available (1).
引用
收藏
页码:1706 / 1716
页数:11
相关论文
共 50 条
  • [1] On the Hyperparameters in Stochastic Gradient Descent with Momentum
    Shi, Bin
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
  • [2] Transferable Neural Projection Representations
    Sankar, Chinnadhurai
    Ravi, Sujith
    Kozareva, Zornitsa
    [J]. 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 3355 - 3360
  • [3] Optimization of hyperparameters for SMS reconstruction
    Muftuler, L. Tugan
    Arpinar, Volkan Emre
    Koch, Kevin
    Bhave, Sampada
    Yang, Baolian
    Kaushik, Sivaram
    Banerjee, Suchandrima
    Nencka, Andrew
    [J]. MAGNETIC RESONANCE IMAGING, 2020, 73 : 91 - 103
  • [4] Learning Adversarially Fair and Transferable Representations
    Madras, David
    Creager, Elliot
    Pitassi, Toniann
    Zemel, Richard
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [5] Optimization of Annealed Importance Sampling Hyperparameters
    Goshtasbpour, Shirin
    Perez-Cruz, Fernando
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT V, 2023, 13717 : 174 - 190
  • [6] Stochastic and Adversarial Online Learning without Hyperparameters
    Cutkosky, Ashok
    Boahen, Kwabena
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [7] Theoretical Aspects in Penalty Hyperparameters Optimization
    Flavia Esposito
    Laura Selicato
    Caterina Sportelli
    [J]. Mediterranean Journal of Mathematics, 2023, 20 (6)
  • [8] Gradient-based optimization of hyperparameters
    Bengio, Y
    [J]. NEURAL COMPUTATION, 2000, 12 (08) : 1889 - 1900
  • [9] Theoretical Aspects in Penalty Hyperparameters Optimization
    Esposito, Flavia
    Selicato, Laura
    Sportelli, Caterina
    [J]. MEDITERRANEAN JOURNAL OF MATHEMATICS, 2023, 20 (06)
  • [10] Efficient gradient computation for optimization of hyperparameters
    Xu, Jingyan
    Noo, Frederic
    [J]. PHYSICS IN MEDICINE AND BIOLOGY, 2022, 67 (03):