XGrad: Boosting Gradient-Based Optimizers With Weight Prediction

被引:2
|
作者
Guan, Lei [1 ]
Li, Dongsheng [2 ]
Shi, Yanqi [2 ]
Meng, Jian [1 ]
机构
[1] Natl Univ Def Technol, Dept Math, Changsha 410073, Hunan, Peoples R China
[2] Natl Univ Def Technol, Natl Key Lab Parallel & Distributed Comp, Changsha 410073, Hunan, Peoples R China
基金
中国国家自然科学基金;
关键词
Training; Artificial neural networks; Convergence; Computational modeling; Backpropagation; Proposals; Predictive models; deep learning; generalization; gradient-based; optimizer; weight prediction;
D O I
10.1109/TPAMI.2024.3387399
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a general deep learning training framework XGrad which introduces weight prediction into the popular gradient-based optimizers to boost their convergence and generalization when training the deep neural network (DNN) models. In particular, ahead of each mini-batch training, the future weights are predicted according to the update rule of the used optimizer and are then applied to both the forward pass and backward propagation. In this way, during the whole training period, the optimizer always utilizes the gradients w.r.t. the future weights to update the DNN parameters, making the gradient-based optimizer achieve better convergence and generalization compared to the original optimizer without weight prediction. XGrad is rather straightforward to implement yet pretty effective in boosting the convergence of gradient-based optimizers and the accuracy of DNN models. Empirical results concerning five popular optimizers including SGD with momentum, Adam, AdamW, AdaBelief, and AdaM3 demonstrate the effectiveness of our proposal. The experimental results validate that XGrad can attain higher model accuracy than the baseline optimizers when training the DNN models.
引用
收藏
页码:6731 / 6747
页数:17
相关论文
共 50 条
  • [21] Weight-Entanglement Meets Gradient-Based Neural Architecture Search
    Sukthanker, Rhea Sanjay
    Krishnakumar, Arjun
    Safari, Mahmoud
    Hutter, Frank
    INTERNATIONAL CONFERENCE ON AUTOMATED MACHINE LEARNING, 2024, 256
  • [22] A Gradient-based Algorithm for trend and outlier prediction in dynamic data streams
    Sun, Dawei
    Lee, Vincent C. S.
    Lu, Ye
    PROCEEDINGS OF THE 2017 12TH IEEE CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA), 2017, : 1978 - 1983
  • [23] Wind Speed Prediction Based on Gradient Boosting Decision Tree
    Fan, Yuxiang
    Lei, Weixuan
    2022 INTERNATIONAL CONFERENCE ON BIG DATA, INFORMATION AND COMPUTER NETWORK (BDICN 2022), 2022, : 93 - 97
  • [24] Gradient-based boosting for statistical relational learning: the Markov logic network and missing data cases
    Tushar Khot
    Sriraam Natarajan
    Kristian Kersting
    Jude Shavlik
    Machine Learning, 2015, 100 : 75 - 100
  • [25] Gradient-based boosting for statistical relational learning: the Markov logic network and missing data cases
    Khot, Tushar
    Natarajan, Sriraam
    Kersting, Kristian
    Shavlik, Jude
    MACHINE LEARNING, 2015, 100 (01) : 75 - 100
  • [26] Gradient-based optimization of hyperparameters
    Bengio, Y
    NEURAL COMPUTATION, 2000, 12 (08) : 1889 - 1900
  • [27] Gradient-based shape descriptors
    Abdulkerim Çapar
    Binnur Kurt
    Muhittin Gökmen
    Machine Vision and Applications, 2009, 20 : 365 - 378
  • [28] Gradient-based learning and optimization
    Cao, XR
    PROCEEDINGS OF THE 17TH INTERNATIONAL SYMPOSIUM ON COMPUTER AND INFORMATION SCIENCES, 2003, : 3 - 7
  • [29] GRADIENT-BASED EDITING TECHNIQUES
    HURD, RE
    PLANT, D
    JOHN, BK
    JOURNAL OF CELLULAR BIOCHEMISTRY, 1993, : 243 - 243
  • [30] Gradient-based image deconvolution
    Huang, Heyan
    Yang, Hang
    Ma, Siliang
    JOURNAL OF ELECTRONIC IMAGING, 2013, 22 (01)