XGrad: Boosting Gradient-Based Optimizers With Weight Prediction

被引：2

作者：

Guan, Lei ^{[1
]}

Li, Dongsheng ^{[2
]}

Shi, Yanqi ^{[2
]}

Meng, Jian ^{[1
]}

机构：

[1] Natl Univ Def Technol, Dept Math, Changsha 410073, Hunan, Peoples R China

[2] Natl Univ Def Technol, Natl Key Lab Parallel & Distributed Comp, Changsha 410073, Hunan, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2024年 / 46卷 / 10期

基金：

中国国家自然科学基金;

关键词：

Training; Artificial neural networks; Convergence; Computational modeling; Backpropagation; Proposals; Predictive models; deep learning; generalization; gradient-based; optimizer; weight prediction;

D O I：

10.1109/TPAMI.2024.3387399

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we propose a general deep learning training framework XGrad which introduces weight prediction into the popular gradient-based optimizers to boost their convergence and generalization when training the deep neural network (DNN) models. In particular, ahead of each mini-batch training, the future weights are predicted according to the update rule of the used optimizer and are then applied to both the forward pass and backward propagation. In this way, during the whole training period, the optimizer always utilizes the gradients w.r.t. the future weights to update the DNN parameters, making the gradient-based optimizer achieve better convergence and generalization compared to the original optimizer without weight prediction. XGrad is rather straightforward to implement yet pretty effective in boosting the convergence of gradient-based optimizers and the accuracy of DNN models. Empirical results concerning five popular optimizers including SGD with momentum, Adam, AdamW, AdaBelief, and AdaM3 demonstrate the effectiveness of our proposal. The experimental results validate that XGrad can attain higher model accuracy than the baseline optimizers when training the DNN models.

引用

页码：6731 / 6747

页数：17

共 50 条

[31] A Gradient-Based Implicit Blend
Gourmel, Olivier
Barthe, Loic
Cani, Marie-Paule
Wyvill, Brian
Bernhardt, Adrien
Paulin, Mathias
Grasberger, Herbert
ACM TRANSACTIONS ON GRAPHICS, 2013, 32 (02):
[32] Gradient-based simulation optimization
Kim, Sujin
PROCEEDINGS OF THE 2006 WINTER SIMULATION CONFERENCE, VOLS 1-5, 2006, : 159 - 167
[33] Gradient-based shape descriptors
Capar, Abdulkerim
Kurt, Binnur
Gokmen, Muhittin
MACHINE VISION AND APPLICATIONS, 2009, 20 (06) : 365 - 378
[34] Gradient-based Sharpness Function
Rudnaya, Maria
Mattheij, Robert
Maubach, Joseph
ter Morsche, Hennie
WORLD CONGRESS ON ENGINEERING, WCE 2011, VOL I, 2011, : 301 - 306
[35] Efficient reversible data hiding algorithm based on gradient-based edge direction prediction
Yang, Wei-Jen
Chung, Kuo-Liang
Liao, Hong-Yuan Mark
Yu, Wen-Kuang
JOURNAL OF SYSTEMS AND SOFTWARE, 2013, 86 (02) : 567 - 580
[36] Prediction of gradient-based similarity functions from the Mellor-Yamada model
Lobocki, Lech
Porretta-Tomaszewska, Paola
QUARTERLY JOURNAL OF THE ROYAL METEOROLOGICAL SOCIETY, 2021, 147 (741) : 3922 - 3939
[37] A new energy gradient-based model for LCF life prediction of turbine discs
Liu, Yunhan
Zhu, Shun-Peng
Yu, Zheng-Yong
Liu, Qiang
2ND INTERNATIONAL CONFERENCE ON STRUCTURAL INTEGRITY, ICSI 2017, 2017, 5 : 856 - 860
[38] Neural network architecture based on gradient boosting for IoT traffic prediction
Lopez-Martin, Manuel
Carro, Belen
Sanchez-Esguevillas, Antonio
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 100 : 656 - 673
[39] The energy efficiency prediction method based on Gradient Boosting Regression Tree
Ma, Hongwei
Yang, Xin
Mao, Jianrong
Zheng, Hao
2018 2ND IEEE CONFERENCE ON ENERGY INTERNET AND ENERGY SYSTEM INTEGRATION (EI2), 2018,
[40] A gradient boosting regression based approach for energy consumption prediction in buildings
Al Bataineh, Ali S.
ADVANCES IN ENERGY RESEARCH, 2019, 6 (02): : 91 - 101

← 1 2 3 4 5 →