Learning Continuous Control Policies by Stochastic Value Gradients

被引:0
|
作者
Heess, Nicolas [1 ]
Wayne, Greg [1 ]
Silver, David [1 ]
Lillicrap, Timothy [1 ]
Tassa, Yuval [1 ]
Erez, Tom [1 ]
机构
[1] Google DeepMind, London, England
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a unified framework for learning continuous control policies using backpropagation. It supports stochastic control by treating stochasticity in the Bellman equation as a deterministic function of exogenous noise. The product is a spectrum of general policy gradient algorithms that range from model-free methods with value functions to model-based methods without value functions. We use learned models but only require observations from the environment instead of observations from model-predicted trajectories, minimizing the impact of compounded model errors. We apply these algorithms first to a toy stochastic control problem and then to several physics-based control problems in simulation. One of these variants, SVG(1), shows the effectiveness of learning models, value functions, and policies simultaneously in continuous domains.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Improving Stochastic Policy Gradients in Continuous Control with Deep Reinforcement Learning using the Beta Distribution
    Chou, Po-Wei
    Maturana, Daniel
    Scherer, Sebastian
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [2] Learning Continuous-Action Control Policies
    Pazis, Jason
    Lagoudakis, Michail G.
    [J]. ADPRL: 2009 IEEE SYMPOSIUM ON ADAPTIVE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING, 2009, : 169 - 176
  • [3] Learning Stochastic Parametric Differentiable Predictive Control Policies
    Drgona, Jan
    Mukherjee, Sayak
    Tuor, Aaron
    Halappanavar, Mahantesh
    Vrabie, Draguna
    [J]. IFAC PAPERSONLINE, 2022, 55 (25): : 121 - 126
  • [4] Reinforcement learning for continuous stochastic control problems
    Munos, R
    Bourgine, P
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 10, 1998, 10 : 1029 - 1035
  • [5] Autoregressive Policies for Continuous Control Deep Reinforcement Learning
    Korenkevych, Dmytro
    Mahmood, A. Rupam
    Vasan, Gautham
    Bergstra, James
    [J]. PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 2754 - 2762
  • [6] Learning Policies for Continuous Control via Transition Models
    Huebotter, Justus
    Thill, Serge
    van Gerven, Marcel
    Lanillos, Pablo
    [J]. ACTIVE INFERENCE, IWAI 2022, 2023, 1721 : 162 - 178
  • [7] Continuous-review tracking policies for dynamic control of stochastic networks
    Maglaras, C
    [J]. QUEUEING SYSTEMS, 2003, 43 (1-2) : 43 - 80
  • [8] Continuous-Review Tracking Policies for Dynamic Control of Stochastic Networks
    Constantinos Maglaras
    [J]. Queueing Systems, 2003, 43 : 43 - 80
  • [9] Learning Policies from Self-Play with Policy Gradients and MCTS Value Estimates
    Soemers, Dennis J. N. J.
    Piette, Eric
    Stephenson, Matthew
    Browne, Cameron
    [J]. 2019 IEEE CONFERENCE ON GAMES (COG), 2019,
  • [10] Learning First-to-Spike Policies for Neuromorphic Control Using Policy Gradients
    Rosenfeld, Bleema
    Simeone, Osvaldo
    Rajendran, Bipin
    [J]. 2019 IEEE 20TH INTERNATIONAL WORKSHOP ON SIGNAL PROCESSING ADVANCES IN WIRELESS COMMUNICATIONS (SPAWC 2019), 2019,