Gradient Monitored Reinforcement Learning

被引:4
|
作者
Abdul Hameed, Mohammed Sharafath [1 ]
Chadha, Gavneet Singh [1 ]
Schwung, Andreas [1 ]
Ding, Steven X. [2 ]
机构
[1] South Westphalia Univ Appl Sci, Dept Automat Technol, D-59494 Soest, Germany
[2] Univ Duisburg Essen, Dept Automat Control & Complex Syst, D-47057 Duisburg, Germany
基金
美国国家卫生研究院;
关键词
Training; Monitoring; Neural networks; Reinforcement learning; Optimization; Games; Task analysis; Atari games; deep neural networks (DNNs); gradient monitoring (GM); MuJoCo; multirobot coordination; OpenAI GYM; reinforcement learning (RL);
D O I
10.1109/TNNLS.2021.3119853
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This article presents a novel neural network training approach for faster convergence and better generalization abilities in deep reinforcement learning (RL). Particularly, we focus on the enhancement of training and evaluation performance in RL algorithms by systematically reducing gradient's variance and, thereby, providing a more targeted learning process. The proposed method, which we term gradient monitoring (GM), is a method to steer the learning in the weight parameters of a neural network based on the dynamic development and feedback from the training process itself. We propose different variants of the GM method that we prove to increase the underlying performance of the model. One of the proposed variants, momentum with GM (M-WGM), allows for a continuous adjustment of the quantum of backpropagated gradients in the network based on certain learning parameters. We further enhance the method with the adaptive M-WGM (AM-WGM) method, which allows for automatic adjustment between focused learning of certain weights versus more dispersed learning depending on the feedback from the rewards collected. As a by-product, it also allows for automatic derivation of the required deep network sizes during training as the method automatically freezes trained weights. The method is applied to two discrete (real-world multirobot coordination problems and Atari games) and one continuous control task (MuJoCo) using advantage actor-critic (A2C) and proximal policy optimization (PPO), respectively. The results obtained particularly underline the applicability and performance improvements of the methods in terms of generalization capability.
引用
收藏
页码:4106 / 4119
页数:14
相关论文
共 50 条
  • [21] Policy Gradient using Weak Derivatives for Reinforcement Learning
    Bhatt, Sujay
    Koppel, Alec
    Krishnamurthy, Vikram
    2019 53RD ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS (CISS), 2019,
  • [22] On the use of the policy gradient and Hessian in inverse reinforcement learning
    Metelli, Alberto Maria
    Pirotta, Matteo
    Restelli, Marcello
    INTELLIGENZA ARTIFICIALE, 2020, 14 (01) : 117 - 150
  • [23] Direct gradient-based reinforcement learning for robot behavior learning
    El-Fakdi, Andres
    Carreras, Marc
    Ridao, Pere
    INFORMATICS IN CONTROL, AUTOMATION AND ROBOTICS II, 2007, : 175 - +
  • [24] Online reinforcement learning control via discontinuous gradient
    Arellano-Muro, Carlos A.
    Castillo-Toledo, Bernardino
    Di Gennaro, Stefano
    Loukianov, Alexander G.
    INTERNATIONAL JOURNAL OF ADAPTIVE CONTROL AND SIGNAL PROCESSING, 2024, 38 (05) : 1762 - 1776
  • [25] A Hybrid Stochastic Policy Gradient Algorithm for Reinforcement Learning
    Pham, Nhan H.
    Nguyen, Ylam M.
    Phan, Ydzung T.
    Zphuong Ha Nguyen
    van Dijk, Zxmarten
    Tran-Dinh, Quoc
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108 : 374 - 384
  • [26] Independent Policy Gradient Methods for Competitive Reinforcement Learning
    Daskalakis, Constantinos
    Foster, Dylan J.
    Golowich, Noah
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [27] Evolution-Guided Policy Gradient in Reinforcement Learning
    Khadka, Shauharda
    Tumer, Kagan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [28] Policy Gradient using Weak Derivatives for Reinforcement Learning
    Bhatt, Sujay
    Koppel, Alec
    Krishnamurthy, Vikram
    2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 5531 - 5537
  • [29] Total stochastic gradient algorithms and applications in reinforcement learning
    Parmas, Paavo
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [30] Variance reduction techniques for gradient estimates in reinforcement learning
    Greensmith, E
    Bartlett, PL
    Baxter, J
    JOURNAL OF MACHINE LEARNING RESEARCH, 2004, 5 : 1471 - 1530