XGrad: Boosting Gradient-Based Optimizers With Weight Prediction

被引:2
|
作者
Guan, Lei [1 ]
Li, Dongsheng [2 ]
Shi, Yanqi [2 ]
Meng, Jian [1 ]
机构
[1] Natl Univ Def Technol, Dept Math, Changsha 410073, Hunan, Peoples R China
[2] Natl Univ Def Technol, Natl Key Lab Parallel & Distributed Comp, Changsha 410073, Hunan, Peoples R China
基金
中国国家自然科学基金;
关键词
Training; Artificial neural networks; Convergence; Computational modeling; Backpropagation; Proposals; Predictive models; deep learning; generalization; gradient-based; optimizer; weight prediction;
D O I
10.1109/TPAMI.2024.3387399
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a general deep learning training framework XGrad which introduces weight prediction into the popular gradient-based optimizers to boost their convergence and generalization when training the deep neural network (DNN) models. In particular, ahead of each mini-batch training, the future weights are predicted according to the update rule of the used optimizer and are then applied to both the forward pass and backward propagation. In this way, during the whole training period, the optimizer always utilizes the gradients w.r.t. the future weights to update the DNN parameters, making the gradient-based optimizer achieve better convergence and generalization compared to the original optimizer without weight prediction. XGrad is rather straightforward to implement yet pretty effective in boosting the convergence of gradient-based optimizers and the accuracy of DNN models. Empirical results concerning five popular optimizers including SGD with momentum, Adam, AdamW, AdaBelief, and AdaM3 demonstrate the effectiveness of our proposal. The experimental results validate that XGrad can attain higher model accuracy than the baseline optimizers when training the DNN models.
引用
收藏
页码:6731 / 6747
页数:17
相关论文
共 50 条
  • [31] A Gradient-Based Implicit Blend
    Gourmel, Olivier
    Barthe, Loic
    Cani, Marie-Paule
    Wyvill, Brian
    Bernhardt, Adrien
    Paulin, Mathias
    Grasberger, Herbert
    ACM TRANSACTIONS ON GRAPHICS, 2013, 32 (02):
  • [32] Gradient-based simulation optimization
    Kim, Sujin
    PROCEEDINGS OF THE 2006 WINTER SIMULATION CONFERENCE, VOLS 1-5, 2006, : 159 - 167
  • [33] Gradient-based shape descriptors
    Capar, Abdulkerim
    Kurt, Binnur
    Gokmen, Muhittin
    MACHINE VISION AND APPLICATIONS, 2009, 20 (06) : 365 - 378
  • [34] Gradient-based Sharpness Function
    Rudnaya, Maria
    Mattheij, Robert
    Maubach, Joseph
    ter Morsche, Hennie
    WORLD CONGRESS ON ENGINEERING, WCE 2011, VOL I, 2011, : 301 - 306
  • [35] Efficient reversible data hiding algorithm based on gradient-based edge direction prediction
    Yang, Wei-Jen
    Chung, Kuo-Liang
    Liao, Hong-Yuan Mark
    Yu, Wen-Kuang
    JOURNAL OF SYSTEMS AND SOFTWARE, 2013, 86 (02) : 567 - 580
  • [36] Prediction of gradient-based similarity functions from the Mellor-Yamada model
    Lobocki, Lech
    Porretta-Tomaszewska, Paola
    QUARTERLY JOURNAL OF THE ROYAL METEOROLOGICAL SOCIETY, 2021, 147 (741) : 3922 - 3939
  • [37] A new energy gradient-based model for LCF life prediction of turbine discs
    Liu, Yunhan
    Zhu, Shun-Peng
    Yu, Zheng-Yong
    Liu, Qiang
    2ND INTERNATIONAL CONFERENCE ON STRUCTURAL INTEGRITY, ICSI 2017, 2017, 5 : 856 - 860
  • [38] Neural network architecture based on gradient boosting for IoT traffic prediction
    Lopez-Martin, Manuel
    Carro, Belen
    Sanchez-Esguevillas, Antonio
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 100 : 656 - 673
  • [39] The energy efficiency prediction method based on Gradient Boosting Regression Tree
    Ma, Hongwei
    Yang, Xin
    Mao, Jianrong
    Zheng, Hao
    2018 2ND IEEE CONFERENCE ON ENERGY INTERNET AND ENERGY SYSTEM INTEGRATION (EI2), 2018,
  • [40] A gradient boosting regression based approach for energy consumption prediction in buildings
    Al Bataineh, Ali S.
    ADVANCES IN ENERGY RESEARCH, 2019, 6 (02): : 91 - 101