Improving Stochastic Policy Gradients in Continuous Control with Deep Reinforcement Learning using the Beta Distribution

被引：0

作者：

Chou, Po-Wei ^{[1
]}

Maturana, Daniel ^{[1
]}

Scherer, Sebastian ^{[1
]}

机构：

[1] Carnegie Mellon Univ, Robot Inst, Pittsburgh, PA 15213 USA

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70 | 2017年 / 70卷

关键词：

NEURAL-NETWORKS;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently, reinforcement learning with deep neural networks has achieved great success in challenging continuous control problems such as 3D locomotion and robotic manipulation. However, in real-world control problems, the actions one can take are bounded by physical constraints, which introduces a bias when the standard Gaussian distribution is used as the stochastic policy. In this work, we propose to use the Beta distribution as an alternative and analyze the bias and variance of the policy gradients of both policies. We show that the Beta policy is bias-free and provides significantly faster convergence and higher scores over the Gaussian policy when both are used with trust region policy optimization (TRPO) and actor critic with experience replay (ACER), the state-of-the-art on- and offpolicy stochastic methods respectively, on OpenAI Gym's and MuJoCo's continuous control environments.

引用

页数：10

共 50 条

[1] Continuous control of structural vibrations using hybrid deep reinforcement learning policy
Panda, Jagajyoti
Chopra, Mudit
Matsagar, Vasant
Chakraborty, Souvik
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 252
[2] Policy ensemble gradient for continuous control problems in deep reinforcement learning
Liu, Guoqiang
Chen, Gang
Huang, Victoria
[J]. NEUROCOMPUTING, 2023, 548
[3] Continuous Control of an Underground Loader Using Deep Reinforcement Learning
Backman, Sofi
Lindmark, Daniel
Bodin, Kenneth
Servin, Martin
Mork, Joakim
Lofgren, Hakan
[J]. MACHINES, 2021, 9 (10)
[4] Reinforcement learning for continuous stochastic control problems
Munos, R
Bourgine, P
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 10, 1998, 10 : 1029 - 1035
[5] Learning Continuous Control Policies by Stochastic Value Gradients
Heess, Nicolas
Wayne, Greg
Silver, David
Lillicrap, Timothy
Tassa, Yuval
Erez, Tom
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
[6] Benchmarking Deep Reinforcement Learning for Continuous Control
Duan, Yan
Chen, Xi
Houthooft, Rein
Schulman, John
Abbeel, Pieter
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
[7] Continuous Control of a Soft Continuum Arm using Deep Reinforcement Learning
Satheeshbabu, Sreeshankar
Uppalapati, Naveen K.
Fu, Tianshi
Krishnan, Girish
[J]. 2020 3RD IEEE INTERNATIONAL CONFERENCE ON SOFT ROBOTICS (ROBOSOFT), 2020, : 497 - 503
[8] Optimal Control of Active Distribution Network using Deep Reinforcement Learning
Tahir, Yameena
Khan, Muhammad Faisal Nadeem
Sajjad, Intisar Ali
Martirano, Luigi
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ENVIRONMENT AND ELECTRICAL ENGINEERING AND 2022 IEEE INDUSTRIAL AND COMMERCIAL POWER SYSTEMS EUROPE (EEEIC / I&CPS EUROPE), 2022,
[9] Expected Policy Gradients for Reinforcement Learning
Ciosek, Kamil
Whiteson, Shimon
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2020, 21
[10] Continuous Parameter Control in Genetic Algorithms using Policy Gradient Reinforcement Learning
de Miguel Gomez, Alejandro
Toosi, Farshad Ghassemi
[J]. PROCEEDINGS OF THE 13TH INTERNATIONAL JOINT CONFERENCE ON COMPUTATIONAL INTELLIGENCE (IJCCI), 2021, : 115 - 122

← 1 2 3 4 5 →