XGrad: Boosting Gradient-Based Optimizers With Weight Prediction

被引：2

作者：

Guan, Lei ^{[1
]}

Li, Dongsheng ^{[2
]}

Shi, Yanqi ^{[2
]}

Meng, Jian ^{[1
]}

机构：

[1] Natl Univ Def Technol, Dept Math, Changsha 410073, Hunan, Peoples R China

[2] Natl Univ Def Technol, Natl Key Lab Parallel & Distributed Comp, Changsha 410073, Hunan, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2024年 / 46卷 / 10期

基金：

中国国家自然科学基金;

关键词：

Training; Artificial neural networks; Convergence; Computational modeling; Backpropagation; Proposals; Predictive models; deep learning; generalization; gradient-based; optimizer; weight prediction;

D O I：

10.1109/TPAMI.2024.3387399

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we propose a general deep learning training framework XGrad which introduces weight prediction into the popular gradient-based optimizers to boost their convergence and generalization when training the deep neural network (DNN) models. In particular, ahead of each mini-batch training, the future weights are predicted according to the update rule of the used optimizer and are then applied to both the forward pass and backward propagation. In this way, during the whole training period, the optimizer always utilizes the gradients w.r.t. the future weights to update the DNN parameters, making the gradient-based optimizer achieve better convergence and generalization compared to the original optimizer without weight prediction. XGrad is rather straightforward to implement yet pretty effective in boosting the convergence of gradient-based optimizers and the accuracy of DNN models. Empirical results concerning five popular optimizers including SGD with momentum, Adam, AdamW, AdaBelief, and AdaM3 demonstrate the effectiveness of our proposal. The experimental results validate that XGrad can attain higher model accuracy than the baseline optimizers when training the DNN models.

引用

页码：6731 / 6747

页数：17

共 50 条

[1] On the hybridization of geometric semantic GP with gradient-based optimizers
Gloria Pietropolli
Luca Manzoni
Alessia Paoletti
Mauro Castelli
Genetic Programming and Evolvable Machines, 2023, 24
[2] On the hybridization of geometric semantic GP with gradient-based optimizers
Pietropolli, Gloria
Manzoni, Luca
Paoletti, Alessia
Castelli, Mauro
GENETIC PROGRAMMING AND EVOLVABLE MACHINES, 2023, 24 (02)
[3] Comparison of Gradient-Based and Gradient-Enhanced Response-Surface-Based Optimizers
Laurenceau, J.
Meaux, M.
Montagnac, M.
Sagaut, P.
AIAA JOURNAL, 2010, 48 (05) : 981 - 994
[4] Reweighted-Boosting: A Gradient-Based Boosting Optimization Framework
He, Guanxiong
Wang, Zheng
Tang, Liaoyuan
Yu, Weizhong
Nie, Feiping
Li, Xuelong
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
[5] A gradient-based boosting algorithm for regression problems
Zemel, RS
Pitassi, T
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 13, 2001, 13 : 696 - 702
[6] AdaSwarm: Augmenting Gradient-Based Optimizers in Deep Learning With Swarm Intelligence
Mohapatra, Rohan
Saha, Snehanshu
Coello, Carlos A. Coello
Bhattacharya, Anwesh
Dhavala, Soma S.
Saha, Sriparna
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2022, 6 (02): : 329 - 340
[7] Fractional Derivative Gradient-Based Optimizers for Neural Networks and Human Activity Recognition
Herrera-Alcantara, Oscar
APPLIED SCIENCES-BASEL, 2022, 12 (18):
[8] A new perspective on optimizers: leveraging Moreau-Yosida approximation in gradient-based learning
Betti, Alessandro
Ciravegna, Gabriele
Gori, Marco
Melacci, Stefano
Mottin, Kevin
Precioso, Frederic
INTELLIGENZA ARTIFICIALE, 2024, 18 (02) : 301 - 311
[9] Yield prediction for crops by gradient-based algorithms
Mahesh, Pavithra
Soundrapandiyan, Rajkumar
PLOS ONE, 2024, 19 (08):
[10] MUTEN: Mutant-Based Ensembles for Boosting Gradient-Based Adversarial Attack
Hu, Qiang
Guo, Yuejun
Cordy, Maxime
Papadakis, Mike
Le Traon, Yves
2023 38TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, ASE, 2023, : 1708 - 1712

← 1 2 3 4 5 →