Practical Critic Gradient based Actor Critic for On-Policy Reinforcement Learning

被引:0
|
作者
Gurumurthy, Swaminathan [1 ]
Manchester, Zachary [1 ]
Kolter, J. Zico [1 ,2 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[2] Bosch Ctr AI, Sunnyvale, CA USA
关键词
Reinforcement Learning; Actor Critic; Continuous control; Highly parallel Environments;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
On-policy reinforcement learning algorithms have been shown to be remarkably efficient at learning policies for continuous control robotics tasks. They are highly parallelizable and hence have benefited tremendously from the recent rise in GPU based parallel simulators. The most widely used on-policy reinforcement learning algorithm is proximal policy optimization (PPO) which was introduced in 2017 and was designed for a somewhat different setting with CPU based serial or less parallelizable simulators. However, suprisingly, it has maintained dominance even on tasks based on the highly parallelizable simulators of today. In this paper, we show that a different class of on-policy algorithms based on estimating the policy gradient using the critic-action gradients are better suited when using highly parallelizable simulators. The primary issues for these algorithms arise from the lack of diversity of the on-policy experiences used for the updates and the instabilities arising from the interaction between the biased critic gradients and the rapidly changing policy distribution. We address the former by simply increasing the number of parallel simulation runs (thanks to the GPU based simulators) along with an appropriate schedule on the policy entropy to ensure diversity of samples. We address the latter by adding a policy averaging step and value averaging step (as in off-policy methods). With these modifications, we observe that the critic gradient based on-policy method (CGAC) consistently achieves higher episode returns compared with existing baselines. Furthermore, in environments with high dimensional action space, CGAC also trains much faster (in wall-clock time) than the corresponding baselines.
引用
收藏
页数:14
相关论文
共 50 条
  • [31] Multi-actor mechanism for actor-critic reinforcement learning
    Li, Lin
    Li, Yuze
    Wei, Wei
    Zhang, Yujia
    Liang, Jiye
    INFORMATION SCIENCES, 2023, 647
  • [32] An efficient and lightweight off-policy actor-critic reinforcement learning framework
    Zhang, Huaqing
    Ma, Hongbin
    Zhang, Xiaofei
    Mersha, Bemnet Wondimagegnehu
    Wang, Li
    Jin, Ying
    APPLIED SOFT COMPUTING, 2024, 163
  • [33] Characterizing the Gap Between Actor-Critic and Policy Gradient
    Wen, Junfeng
    Kumar, Saurabh
    Gummadi, Ramki
    Schuurmans, Dale
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [34] Improved Soft Actor-Critic: Mixing Prioritized Off-Policy Samples With On-Policy Experiences
    Banerjee, Chayan
    Chen, Zhiyong
    Noman, Nasimul
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (03) : 3121 - 3129
  • [35] Actor-Critic Learning Control With Regularization and Feature Selection in Policy Gradient Estimation
    Li, Luntong
    Li, Dazi
    Song, Tianheng
    Xu, Xin
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (03) : 1217 - 1227
  • [36] Policy Gradient and Actor-Critic Learning in Continuous Time and Space: Theory and Algorithms
    Jia, Yanwei
    Zhou, Xun Yu
    JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23
  • [37] Variational value learning in advantage actor-critic reinforcement learning
    Zhang, Yaozhong
    Han, Jiaqi
    Hu, Xiaofang
    Dan, Shihao
    2020 CHINESE AUTOMATION CONGRESS (CAC 2020), 2020, : 1955 - 1960
  • [38] AN ACTOR-CRITIC REINFORCEMENT LEARNING ALGORITHM BASED ON ADAPTIVE RBF NETWORK
    Li, Chun-Gui
    Wang, Meng
    Huang, Zhen-Jin
    Zhang, Zeng-Fang
    PROCEEDINGS OF 2009 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-6, 2009, : 984 - 988
  • [39] Reinforcement learning with actor-critic for knowledge graph reasoning
    Zhang, Linli
    Li, Dewei
    Xi, Yugeng
    Jia, Shuai
    SCIENCE CHINA-INFORMATION SCIENCES, 2020, 63 (06)
  • [40] Actor-critic reinforcement learning for bidding in bilateral negotiation
    Arslan, Furkan
    Aydogan, Reyhan
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2022, 30 (05) : 1695 - 1714