Improving Stochastic Policy Gradients in Continuous Control with Deep Reinforcement Learning using the Beta Distribution

被引:0
|
作者
Chou, Po-Wei [1 ]
Maturana, Daniel [1 ]
Scherer, Sebastian [1 ]
机构
[1] Carnegie Mellon Univ, Robot Inst, Pittsburgh, PA 15213 USA
关键词
NEURAL-NETWORKS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, reinforcement learning with deep neural networks has achieved great success in challenging continuous control problems such as 3D locomotion and robotic manipulation. However, in real-world control problems, the actions one can take are bounded by physical constraints, which introduces a bias when the standard Gaussian distribution is used as the stochastic policy. In this work, we propose to use the Beta distribution as an alternative and analyze the bias and variance of the policy gradients of both policies. We show that the Beta policy is bias-free and provides significantly faster convergence and higher scores over the Gaussian policy when both are used with trust region policy optimization (TRPO) and actor critic with experience replay (ACER), the state-of-the-art on- and offpolicy stochastic methods respectively, on OpenAI Gym's and MuJoCo's continuous control environments.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Continuous control of structural vibrations using hybrid deep reinforcement learning policy
    Panda, Jagajyoti
    Chopra, Mudit
    Matsagar, Vasant
    Chakraborty, Souvik
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 252
  • [2] Policy ensemble gradient for continuous control problems in deep reinforcement learning
    Liu, Guoqiang
    Chen, Gang
    Huang, Victoria
    [J]. NEUROCOMPUTING, 2023, 548
  • [3] Continuous Control of an Underground Loader Using Deep Reinforcement Learning
    Backman, Sofi
    Lindmark, Daniel
    Bodin, Kenneth
    Servin, Martin
    Mork, Joakim
    Lofgren, Hakan
    [J]. MACHINES, 2021, 9 (10)
  • [4] Reinforcement learning for continuous stochastic control problems
    Munos, R
    Bourgine, P
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 10, 1998, 10 : 1029 - 1035
  • [5] Learning Continuous Control Policies by Stochastic Value Gradients
    Heess, Nicolas
    Wayne, Greg
    Silver, David
    Lillicrap, Timothy
    Tassa, Yuval
    Erez, Tom
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
  • [6] Benchmarking Deep Reinforcement Learning for Continuous Control
    Duan, Yan
    Chen, Xi
    Houthooft, Rein
    Schulman, John
    Abbeel, Pieter
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [7] Continuous Control of a Soft Continuum Arm using Deep Reinforcement Learning
    Satheeshbabu, Sreeshankar
    Uppalapati, Naveen K.
    Fu, Tianshi
    Krishnan, Girish
    [J]. 2020 3RD IEEE INTERNATIONAL CONFERENCE ON SOFT ROBOTICS (ROBOSOFT), 2020, : 497 - 503
  • [8] Optimal Control of Active Distribution Network using Deep Reinforcement Learning
    Tahir, Yameena
    Khan, Muhammad Faisal Nadeem
    Sajjad, Intisar Ali
    Martirano, Luigi
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ENVIRONMENT AND ELECTRICAL ENGINEERING AND 2022 IEEE INDUSTRIAL AND COMMERCIAL POWER SYSTEMS EUROPE (EEEIC / I&CPS EUROPE), 2022,
  • [9] Expected Policy Gradients for Reinforcement Learning
    Ciosek, Kamil
    Whiteson, Shimon
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2020, 21
  • [10] Continuous Parameter Control in Genetic Algorithms using Policy Gradient Reinforcement Learning
    de Miguel Gomez, Alejandro
    Toosi, Farshad Ghassemi
    [J]. PROCEEDINGS OF THE 13TH INTERNATIONAL JOINT CONFERENCE ON COMPUTATIONAL INTELLIGENCE (IJCCI), 2021, : 115 - 122