Gradient Monitored Reinforcement Learning

被引：4

作者：

Abdul Hameed, Mohammed Sharafath ^{[1
]}

Chadha, Gavneet Singh ^{[1
]}

Schwung, Andreas ^{[1
]}

Ding, Steven X. ^{[2
]}

机构：

[1] South Westphalia Univ Appl Sci, Dept Automat Technol, D-59494 Soest, Germany

[2] Univ Duisburg Essen, Dept Automat Control & Complex Syst, D-47057 Duisburg, Germany

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2023年 / 34卷 / 08期

基金：

美国国家卫生研究院;

关键词：

Training; Monitoring; Neural networks; Reinforcement learning; Optimization; Games; Task analysis; Atari games; deep neural networks (DNNs); gradient monitoring (GM); MuJoCo; multirobot coordination; OpenAI GYM; reinforcement learning (RL);

D O I：

10.1109/TNNLS.2021.3119853

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This article presents a novel neural network training approach for faster convergence and better generalization abilities in deep reinforcement learning (RL). Particularly, we focus on the enhancement of training and evaluation performance in RL algorithms by systematically reducing gradient's variance and, thereby, providing a more targeted learning process. The proposed method, which we term gradient monitoring (GM), is a method to steer the learning in the weight parameters of a neural network based on the dynamic development and feedback from the training process itself. We propose different variants of the GM method that we prove to increase the underlying performance of the model. One of the proposed variants, momentum with GM (M-WGM), allows for a continuous adjustment of the quantum of backpropagated gradients in the network based on certain learning parameters. We further enhance the method with the adaptive M-WGM (AM-WGM) method, which allows for automatic adjustment between focused learning of certain weights versus more dispersed learning depending on the feedback from the rewards collected. As a by-product, it also allows for automatic derivation of the required deep network sizes during training as the method automatically freezes trained weights. The method is applied to two discrete (real-world multirobot coordination problems and Atari games) and one continuous control task (MuJoCo) using advantage actor-critic (A2C) and proximal policy optimization (PPO), respectively. The results obtained particularly underline the applicability and performance improvements of the methods in terms of generalization capability.

引用

页码：4106 / 4119

页数：14

共 50 条

[31] Variance reduction techniques for gradient estimates in reinforcement learning
Greensmith, E
Bartlett, PL
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 14, VOLS 1 AND 2, 2002, 14 : 1507 - 1514
[32] MQGrad: Reinforcement Learning of Gradient Quantization in Parameter Server
Cui, Guoxin
Xu, Jun
Zeng, Wei
Lan, Yanyan
Guo, Jiafeng
Cheng, Xueqi
PROCEEDINGS OF THE 2018 ACM SIGIR INTERNATIONAL CONFERENCE ON THEORY OF INFORMATION RETRIEVAL (ICTIR'18), 2018, : 83 - 90
[33] Inverse Reinforcement Learning through Policy Gradient Minimization
Pirotta, Matteo
Restelli, Marcello
THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 1993 - 1999
[34] Variance reduction techniques for gradient estimates in reinforcement learning
Greensmith, Evan
Bartlett, Peter L.
Baxter, Jonathan
Journal of Machine Learning Research, 2004, 5 : 1471 - 1530
[35] Policy gradient methods for reinforcement learning with function approximation
Sutton, RS
McAllester, D
Singh, S
Mansour, Y
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 12, 2000, 12 : 1057 - 1063
[36] Fuzzy Baselines to Stabilize Policy Gradient Reinforcement Learning
Surita, Gabriela
Lemos, Andre
Gomide, Fernando
EXPLAINABLE AI AND OTHER APPLICATIONS OF FUZZY TECHNIQUES, NAFIPS 2021, 2022, 258 : 436 - 446
[37] Reinforcement Learning Using a Stochastic Gradient Method with Memory-Based Learning
Yamada, Takafumi
Yamaguchi, Satoshi
ELECTRICAL ENGINEERING IN JAPAN, 2010, 173 (01) : 32 - 40
[38] Molecule generation using transformers and policy gradient reinforcement learning
Mazuz, Eyal
Shtar, Guy
Shapira, Bracha
Rokach, Lior
SCIENTIFIC REPORTS, 2023, 13 (01)
[39] Reinforcement learning for continuous action using stochastic gradient ascent
Kimura, H
Kobayashi, S
INTELLIGENT AUTONOMOUS SYSTEMS: IAS-5, 1998, : 288 - 295
[40] Meta-Gradient Reinforcement Learning with an Objective Discovered Online
Xu, Zhongwen
van Hasselt, Hado
Hessel, Matteo
Oh, Junhyuk
Singh, Satinder
Silver, David
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33

← 1 2 3 4 5 →