XGrad: Boosting Gradient-Based Optimizers With Weight Prediction

被引:2
|
作者
Guan, Lei [1 ]
Li, Dongsheng [2 ]
Shi, Yanqi [2 ]
Meng, Jian [1 ]
机构
[1] Natl Univ Def Technol, Dept Math, Changsha 410073, Hunan, Peoples R China
[2] Natl Univ Def Technol, Natl Key Lab Parallel & Distributed Comp, Changsha 410073, Hunan, Peoples R China
基金
中国国家自然科学基金;
关键词
Training; Artificial neural networks; Convergence; Computational modeling; Backpropagation; Proposals; Predictive models; deep learning; generalization; gradient-based; optimizer; weight prediction;
D O I
10.1109/TPAMI.2024.3387399
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a general deep learning training framework XGrad which introduces weight prediction into the popular gradient-based optimizers to boost their convergence and generalization when training the deep neural network (DNN) models. In particular, ahead of each mini-batch training, the future weights are predicted according to the update rule of the used optimizer and are then applied to both the forward pass and backward propagation. In this way, during the whole training period, the optimizer always utilizes the gradients w.r.t. the future weights to update the DNN parameters, making the gradient-based optimizer achieve better convergence and generalization compared to the original optimizer without weight prediction. XGrad is rather straightforward to implement yet pretty effective in boosting the convergence of gradient-based optimizers and the accuracy of DNN models. Empirical results concerning five popular optimizers including SGD with momentum, Adam, AdamW, AdaBelief, and AdaM3 demonstrate the effectiveness of our proposal. The experimental results validate that XGrad can attain higher model accuracy than the baseline optimizers when training the DNN models.
引用
收藏
页码:6731 / 6747
页数:17
相关论文
共 50 条
  • [41] Prediction of home energy consumption based on gradient boosting regression tree
    Nie, Peng
    Roccotelli, Michele
    Fanti, Maria Pia
    Ming, Zhengfeng
    Li, Zhiwu
    ENERGY REPORTS, 2021, 7 : 1246 - 1255
  • [42] Optimal Tuning of Power System Stabilizers for a Multi-Machine Power Systems Using Hybrid Gorilla Troops and Gradient-Based Optimizers
    El-Dabah, Mahmoud A.
    Hassan, Mohamed H.
    Kamel, Salah
    Abido, Mohamed A.
    Zawbaa, Hossam M.
    IEEE ACCESS, 2023, 11 : 27168 - 27188
  • [43] A GRADIENT-BASED METHOD FOR TEAM EVASION
    Liu, Shih-Yuan
    Zhou, Zhengyuan
    Tomlin, Claire
    Hedrick, Karl
    ASME 2013 DYNAMIC SYSTEMS AND CONTROL CONFERENCE, VOL. 3, 2013,
  • [44] The Gradient-Based Cache Partitioning Algorithm
    Hasenplaugh, William
    Ahuja, Pritpal S.
    Jaleel, Aamer
    Steely, Simon, Jr.
    Emer, Joel
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2012, 8 (04)
  • [45] Robust Gradient-Based Markov Subsampling
    Gong, Tieliang
    Xi, Quanhan
    Xu, Chen
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 4004 - 4011
  • [46] Gradient-Based Competitive Learning: Theory
    Giansalvo Cirrincione
    Vincenzo Randazzo
    Pietro Barbiero
    Gabriele Ciravegna
    Eros Pasero
    Cognitive Computation, 2024, 16 : 608 - 623
  • [47] Average Gradient-Based Adversarial Attack
    Wan, Chen
    Huang, Fangjun
    Zhao, Xianfeng
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 9572 - 9585
  • [48] Gradient-Based Competitive Learning: Theory
    Cirrincione, Giansalvo
    Randazzo, Vincenzo
    Barbiero, Pietro
    Ciravegna, Gabriele
    Pasero, Eros
    COGNITIVE COMPUTATION, 2024, 16 (02) : 608 - 623
  • [49] Gradient-based adaptive importance samplers
    Elvira, Victor
    Chouzenoux, Emilie
    Akyildiz, Omer Deniz
    Martino, Luca
    JOURNAL OF THE FRANKLIN INSTITUTE-ENGINEERING AND APPLIED MATHEMATICS, 2023, 360 (13): : 9490 - 9514
  • [50] A skeletonization algorithm for gradient-based optimization
    Menten, Martin J.
    Paetzold, Johannes C.
    Zimmer, Veronika A.
    Shit, Suprosanna
    Ezhov, Ivan
    Holland, Robbie
    Probst, Monika
    Schnabel, Julia A.
    Rueckert, Daniel
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 21337 - 21346