AdaSAM: Boosting sharpness-aware minimization with adaptive learning rate and momentum for neural networks

被引:5
|
作者
Sun, Hao [1 ]
Shen, Li [2 ]
Zhong, Qihuang [3 ]
Ding, Liang [2 ]
Chen, Shixiang [4 ]
Sun, Jingwei [1 ]
Li, Jing [1 ]
Sun, Guangzhong [1 ]
Tao, Dacheng [5 ]
机构
[1] Univ Sci & Technol China, Sch Comp Sci, Hefei 230026, Anhui, Peoples R China
[2] JD com, Beijing, Peoples R China
[3] Wuhan Univ, Sch Comp Sci, Wuhan 430072, Hubei, Peoples R China
[4] Univ Sci & Technol China, Sch Math Sci, Hefei 230026, Anhui, Peoples R China
[5] Univ Sydney, Sch Comp Sci, Sydney, NSW 2006, Australia
关键词
Sharpness-aware minimization; Adaptive learning rate; Non-convex optimization; Momentum acceleration; Linear speedup; CONVERGENCE;
D O I
10.1016/j.neunet.2023.10.044
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sharpness aware minimization (SAM) optimizer has been extensively explored as it can generalize better for training deep neural networks via introducing extra perturbation steps to flatten the landscape of deep learning models. Integrating SAM with adaptive learning rate and momentum acceleration, dubbed AdaSAM, has already been explored empirically to train large-scale deep neural networks without theoretical guarantee due to the triple difficulties in analyzing the coupled perturbation step, adaptive learning rate and momentum step. In this paper, we try to analyze the convergence rate of AdaSAM in the stochastic non-convex setting. We root theoretically show that AdaSAM admits a O(1/ bT) convergence rate, which achieves linear speedup property with respect to mini-batch size b. Specifically, to decouple the stochastic gradient steps with the adaptive learning rate and perturbed gradient, we introduce the delayed second-order momentum term to decompose them to make them independent while taking an expectation during the analysis. Then we bound them by showing the adaptive learning rate has a limited range, which makes our analysis feasible. To the best of our knowledge, we are the first to provide the non-trivial convergence rate of SAM with an adaptive learning rate and momentum acceleration. At last, we conduct several experiments on several NLP tasks and the synthetic task, which show that AdaSAM could achieve superior performance compared with SGD, AMSGrad, and SAM optimizers.
引用
收藏
页码:506 / 519
页数:14
相关论文
共 50 条
  • [31] ImbSAM: A Closer Look at Sharpness-Aware Minimization in Class-Imbalanced Recognition
    Zhou, Yixuan
    Qu, Yi
    Xu, Xing
    Shen, Hengtao
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 11311 - 11321
  • [32] 基于Sharpness-Aware Minimization程序的最优Rho值选择
    沈奥然
    软件, 2023, (01) : 126 - 129
  • [33] Enhancing Fine-Tuning based Backdoor Defense with Sharpness-Aware Minimization
    Zhu, Mingli
    Wei, Shaokui
    Shen, Li
    Fan, Yanbo
    Wu, Baoyuan
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 4443 - 4454
  • [34] Adaptive Learning Rate and Momentum for Training Deep Neural Networks
    Hao, Zhiyong
    Jiang, Yixuan
    Yu, Huihua
    Chiang, Hsiao-Dong
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2021: RESEARCH TRACK, PT III, 2021, 12977 : 381 - 396
  • [35] The Dynamics of Sharpness-Aware Minimization: Bouncing Across Ravines and Drifting Towards Wide Minima
    Bartlett, Peter L.
    Long, Philip M.
    Bousquet, Olivier
    JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
  • [36] Class-Conditional Sharpness-Aware Minimization for Deep Long-Tailed Recognition
    Zhou, Zhipeng
    Li, Lanqing
    Zhao, Peilin
    Heng, Pheng-Ann
    Gong, Wei
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 3499 - 3509
  • [37] SAR: Sharpness-Aware minimization for enhancing DNNs’ Robustness against bit-flip errors
    Zhou, Changbao
    Du, Jiawei
    Yan, Ming
    Yue, Hengshan
    Wei, Xiaohui
    Zhou, Joey Tianyi
    Journal of Systems Architecture, 2024, 156
  • [38] Sharp-MAML: Sharpness-Aware Model-Agnostic Meta Learning
    Abbas, Momin
    Xiao, Quan
    Chen, Lisha
    Chen, Pin-Yu
    Chen, Tianyi
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022, : 10 - 32
  • [39] Generalized Federated Learning via Sharpness Aware Minimization
    Qu, Zhe
    Li, Xingyu
    Duan, Rui
    Liu, Yao
    Tang, Bo
    Lu, Zhuo
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [40] Learning with noisy labels via clean aware sharpness aware minimization
    Bin Huang
    Ying Xie
    Chaoyang Xu
    Scientific Reports, 15 (1)