Practical Critic Gradient based Actor Critic for On-Policy Reinforcement Learning

被引:0
|
作者
Gurumurthy, Swaminathan [1 ]
Manchester, Zachary [1 ]
Kolter, J. Zico [1 ,2 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[2] Bosch Ctr AI, Sunnyvale, CA USA
关键词
Reinforcement Learning; Actor Critic; Continuous control; Highly parallel Environments;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
On-policy reinforcement learning algorithms have been shown to be remarkably efficient at learning policies for continuous control robotics tasks. They are highly parallelizable and hence have benefited tremendously from the recent rise in GPU based parallel simulators. The most widely used on-policy reinforcement learning algorithm is proximal policy optimization (PPO) which was introduced in 2017 and was designed for a somewhat different setting with CPU based serial or less parallelizable simulators. However, suprisingly, it has maintained dominance even on tasks based on the highly parallelizable simulators of today. In this paper, we show that a different class of on-policy algorithms based on estimating the policy gradient using the critic-action gradients are better suited when using highly parallelizable simulators. The primary issues for these algorithms arise from the lack of diversity of the on-policy experiences used for the updates and the instabilities arising from the interaction between the biased critic gradients and the rapidly changing policy distribution. We address the former by simply increasing the number of parallel simulation runs (thanks to the GPU based simulators) along with an appropriate schedule on the policy entropy to ensure diversity of samples. We address the latter by adding a policy averaging step and value averaging step (as in off-policy methods). With these modifications, we observe that the critic gradient based on-policy method (CGAC) consistently achieves higher episode returns compared with existing baselines. Furthermore, in environments with high dimensional action space, CGAC also trains much faster (in wall-clock time) than the corresponding baselines.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Boosting On-Policy Actor-Critic With Shallow Updates in Critic
    Li, Luntong
    Zhu, Yuanheng
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, : 1 - 10
  • [2] Variance Penalized On-Policy and Off-Policy Actor-Critic
    Jain, Arushi
    Patil, Gandharv
    Jain, Ayush
    Khetarpa, Khimya
    Precup, Doina
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 7899 - 7907
  • [3] Robust Offline Actor-Critic with On-Policy Regularized Policy Evaluation
    Cao, Shuo
    Wang, Xuesong
    Cheng, Yuhu
    IEEE/CAA Journal of Automatica Sinica, 2024, 11 (12) : 2497 - 2511
  • [4] Actor-Critic based Improper Reinforcement Learning
    Zaki, Mohammadi
    Mohan, Avinash
    Gopalan, Aditya
    Mannor, Shie
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [5] Robust Offline Actor-Critic With On-policy Regularized Policy Evaluation
    Shuo Cao
    Xuesong Wang
    Yuhu Cheng
    IEEE/CAA Journal of Automatica Sinica, 2024, 11 (12) : 2497 - 2511
  • [6] Selector-Actor-Critic and Tuner-Actor-Critic Algorithms for Reinforcement Learning
    Masadeh, Ala'eddin
    Wang, Zhengdao
    Kamal, Ahmed E.
    2019 11TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS AND SIGNAL PROCESSING (WCSP), 2019,
  • [7] Policy-Gradient Based Actor-Critic Algorithms
    Awate, Yogesh P.
    PROCEEDINGS OF THE 2009 WRI GLOBAL CONGRESS ON INTELLIGENT SYSTEMS, VOL III, 2009, : 505 - 509
  • [8] Actor-Critic reinforcement learning based on prior knowledge
    Yang, Zhenyu, 1600, Transport and Telecommunication Institute, Lomonosova street 1, Riga, LV-1019, Latvia (18):
  • [9] A World Model for Actor–Critic in Reinforcement Learning
    A. I. Panov
    L. A. Ugadiarov
    Pattern Recognition and Image Analysis, 2023, 33 : 467 - 477
  • [10] Distributed off-Policy Actor-Critic Reinforcement Learning with Policy Consensus
    Zhang, Yan
    Zavlanos, Michael M.
    2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 4674 - 4679