Gradient Monitored Reinforcement Learning

被引:4
|
作者
Abdul Hameed, Mohammed Sharafath [1 ]
Chadha, Gavneet Singh [1 ]
Schwung, Andreas [1 ]
Ding, Steven X. [2 ]
机构
[1] South Westphalia Univ Appl Sci, Dept Automat Technol, D-59494 Soest, Germany
[2] Univ Duisburg Essen, Dept Automat Control & Complex Syst, D-47057 Duisburg, Germany
基金
美国国家卫生研究院;
关键词
Training; Monitoring; Neural networks; Reinforcement learning; Optimization; Games; Task analysis; Atari games; deep neural networks (DNNs); gradient monitoring (GM); MuJoCo; multirobot coordination; OpenAI GYM; reinforcement learning (RL);
D O I
10.1109/TNNLS.2021.3119853
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This article presents a novel neural network training approach for faster convergence and better generalization abilities in deep reinforcement learning (RL). Particularly, we focus on the enhancement of training and evaluation performance in RL algorithms by systematically reducing gradient's variance and, thereby, providing a more targeted learning process. The proposed method, which we term gradient monitoring (GM), is a method to steer the learning in the weight parameters of a neural network based on the dynamic development and feedback from the training process itself. We propose different variants of the GM method that we prove to increase the underlying performance of the model. One of the proposed variants, momentum with GM (M-WGM), allows for a continuous adjustment of the quantum of backpropagated gradients in the network based on certain learning parameters. We further enhance the method with the adaptive M-WGM (AM-WGM) method, which allows for automatic adjustment between focused learning of certain weights versus more dispersed learning depending on the feedback from the rewards collected. As a by-product, it also allows for automatic derivation of the required deep network sizes during training as the method automatically freezes trained weights. The method is applied to two discrete (real-world multirobot coordination problems and Atari games) and one continuous control task (MuJoCo) using advantage actor-critic (A2C) and proximal policy optimization (PPO), respectively. The results obtained particularly underline the applicability and performance improvements of the methods in terms of generalization capability.
引用
收藏
页码:4106 / 4119
页数:14
相关论文
共 50 条
  • [41] Derivatives of Logarithmic Stationary Distributions for Policy Gradient Reinforcement Learning
    Morimura, Tetsuro
    Uchibe, Eiji
    Yoshimoto, Junichiro
    Peters, Jan
    Doya, Kenji
    NEURAL COMPUTATION, 2010, 22 (02) : 342 - 376
  • [42] Gradient based method for symmetric and asymmetric multiagent reinforcement learning
    Könönen, V
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING, 2003, 2690 : 68 - 75
  • [43] Shaping multi-agent systems with gradient reinforcement learning
    Buffet, Olivier
    Dutech, Alain
    Charpillet, Francois
    AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2007, 15 (02) : 197 - 220
  • [44] Gradient Imitation Reinforcement Learning for Low Resource Relation Extraction
    Hu, Xuming
    Zhang, Chenwei
    Yang, Yawen
    Li, Xiaohe
    Lin, Li
    Wen, Lijie
    Yu, Philip S.
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 2737 - 2746
  • [45] Hessian matrix distribution for Bayesian policy gradient reinforcement learning
    Ngo Anh Vien
    Yu, Hwanjo
    Chung, TaeChoong
    INFORMATION SCIENCES, 2011, 181 (09) : 1671 - 1685
  • [46] Inverse Reinforcement Learning from a Gradient-based Learner
    Ramponi, Giorgia
    Drappo, Gianluca
    Restelli, Marcello
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [47] A Theoretical Understanding of Gradient Bias in Meta-Reinforcement Learning
    Liu, Bo
    Feng, Xidong
    Ren, Jie
    Mai, Luo
    Zhu, Rui
    Zhang, Haifeng
    Wang, Jun
    Yang, Yaodong
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [48] Using policy gradient reinforcement learning on autonomous robot controllers
    Grudic, GZ
    Kumar, V
    Ungar, L
    IROS 2003: PROCEEDINGS OF THE 2003 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, VOLS 1-4, 2003, : 406 - 411
  • [49] Spiking Variational Policy Gradient for Brain Inspired Reinforcement Learning
    Yang, Zhile
    Guo, Shangqi
    Fang, Ying
    Yu, Zhaofei
    Liu, Jian K.
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2025, 47 (03) : 1975 - 1990
  • [50] Trapezoidal Gradient Descent for Effective Reinforcement Learning in Spiking Networks
    Pan, Yuhao
    Wang, Xiucheng
    Cheng, Nan
    Qiu, Qi
    2024 INTERNATIONAL CONFERENCE ON UBIQUITOUS COMMUNICATION, UCOM 2024, 2024, : 192 - 196