Practical Critic Gradient based Actor Critic for On-Policy Reinforcement Learning

被引：0

作者：

Gurumurthy, Swaminathan ^{[1
]}

Manchester, Zachary ^{[1
]}

Kolter, J. Zico ^{[1
,2
]}

机构：

[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA

[2] Bosch Ctr AI, Sunnyvale, CA USA

来源：

LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 211 | 2023年 / 211卷

关键词：

Reinforcement Learning; Actor Critic; Continuous control; Highly parallel Environments;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

On-policy reinforcement learning algorithms have been shown to be remarkably efficient at learning policies for continuous control robotics tasks. They are highly parallelizable and hence have benefited tremendously from the recent rise in GPU based parallel simulators. The most widely used on-policy reinforcement learning algorithm is proximal policy optimization (PPO) which was introduced in 2017 and was designed for a somewhat different setting with CPU based serial or less parallelizable simulators. However, suprisingly, it has maintained dominance even on tasks based on the highly parallelizable simulators of today. In this paper, we show that a different class of on-policy algorithms based on estimating the policy gradient using the critic-action gradients are better suited when using highly parallelizable simulators. The primary issues for these algorithms arise from the lack of diversity of the on-policy experiences used for the updates and the instabilities arising from the interaction between the biased critic gradients and the rapidly changing policy distribution. We address the former by simply increasing the number of parallel simulation runs (thanks to the GPU based simulators) along with an appropriate schedule on the policy entropy to ensure diversity of samples. We address the latter by adding a policy averaging step and value averaging step (as in off-policy methods). With these modifications, we observe that the critic gradient based on-policy method (CGAC) consistently achieves higher episode returns compared with existing baselines. Furthermore, in environments with high dimensional action space, CGAC also trains much faster (in wall-clock time) than the corresponding baselines.

引用

页数：14

共 50 条

[41] Reinforcement learning with actor-critic for knowledge graph reasoning
Linli Zhang
Dewei Li
Yugeng Xi
Shuai Jia
Science China Information Sciences, 2020, 63
[42] A Sandpile Model for Reliable Actor-Critic Reinforcement Learning
Peng, Yiming
Chen, Gang
Zhang, Mengjie
Pang, Shaoning
2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 4014 - 4021
[43] Reinforcement learning with actor-critic for knowledge graph reasoning
Linli ZHANG
Dewei LI
Yugeng XI
Shuai JIA
Science China(Information Sciences), 2020, 63 (06) : 223 - 225
[44] Military Decision Support with Actor and Critic Reinforcement Learning Agents
Ma, Jungmok
DEFENCE SCIENCE JOURNAL, 2024, 74 (03) : 389 - 398
[45] Actor Critic Deep Reinforcement Learning for Neural Malware Control
Wang, Yu
Stokes, Jack W.
Marinescu, Mady
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 1005 - 1012
[46] Actor-Critic Reinforcement Learning for Tracking Control in Robotics
Pane, Yudha P.
Nageshrao, Subramanya P.
Babuska, Robert
2016 IEEE 55TH CONFERENCE ON DECISION AND CONTROL (CDC), 2016, : 5819 - 5826
[47] Visual Navigation with Actor-Critic Deep Reinforcement Learning
Shao, Kun
Zhao, Dongbin
Zhu, Yuanheng
Zhang, Qichao
2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
[48] Reinforcement learning for a biped robot based on a CPG-actor-critic method
Nakamura, Yutaka
Mori, Takeshi
Sato, Masa-Aki
Ishii, Shin
NEURAL NETWORKS, 2007, 20 (06) : 723 - 735
[49] Actor-Critic Reinforcement Learning for Control With Stability Guarantee
Han, Minghao
Zhang, Lixian
Wang, Jun
Pan, Wei
IEEE ROBOTICS AND AUTOMATION LETTERS, 2020, 5 (04) : 6217 - 6224
[50] Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning
Wu, Yue
Zhai, Shuangfei
Srivastava, Nitish
Susskind, Joshua
Zhang, Jian
Salakhutdinov, Ruslan
Goh, Hanlin
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139

← 1 2 3 4 5 →