Learning Continuous Control Policies by Stochastic Value Gradients

被引：0

作者：

Heess, Nicolas ^{[1
]}

Wayne, Greg ^{[1
]}

Silver, David ^{[1
]}

Lillicrap, Timothy ^{[1
]}

Tassa, Yuval ^{[1
]}

Erez, Tom ^{[1
]}

机构：

[1] Google DeepMind, London, England

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015) | 2015年 / 28卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present a unified framework for learning continuous control policies using backpropagation. It supports stochastic control by treating stochasticity in the Bellman equation as a deterministic function of exogenous noise. The product is a spectrum of general policy gradient algorithms that range from model-free methods with value functions to model-based methods without value functions. We use learned models but only require observations from the environment instead of observations from model-predicted trajectories, minimizing the impact of compounded model errors. We apply these algorithms first to a toy stochastic control problem and then to several physics-based control problems in simulation. One of these variants, SVG(1), shows the effectiveness of learning models, value functions, and policies simultaneously in continuous domains.

引用

页数：9

共 50 条

[1] Improving Stochastic Policy Gradients in Continuous Control with Deep Reinforcement Learning using the Beta Distribution
Chou, Po-Wei
Maturana, Daniel
Scherer, Sebastian
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
[2] Learning Continuous-Action Control Policies
Pazis, Jason
Lagoudakis, Michail G.
[J]. ADPRL: 2009 IEEE SYMPOSIUM ON ADAPTIVE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING, 2009, : 169 - 176
[3] Learning Stochastic Parametric Differentiable Predictive Control Policies
Drgona, Jan
Mukherjee, Sayak
Tuor, Aaron
Halappanavar, Mahantesh
Vrabie, Draguna
[J]. IFAC PAPERSONLINE, 2022, 55 (25): : 121 - 126
[4] Reinforcement learning for continuous stochastic control problems
Munos, R
Bourgine, P
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 10, 1998, 10 : 1029 - 1035
[5] Autoregressive Policies for Continuous Control Deep Reinforcement Learning
Korenkevych, Dmytro
Mahmood, A. Rupam
Vasan, Gautham
Bergstra, James
[J]. PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 2754 - 2762
[6] Learning Policies for Continuous Control via Transition Models
Huebotter, Justus
Thill, Serge
van Gerven, Marcel
Lanillos, Pablo
[J]. ACTIVE INFERENCE, IWAI 2022, 2023, 1721 : 162 - 178
[7] Continuous-review tracking policies for dynamic control of stochastic networks
Maglaras, C
[J]. QUEUEING SYSTEMS, 2003, 43 (1-2) : 43 - 80
[8] Continuous-Review Tracking Policies for Dynamic Control of Stochastic Networks
Constantinos Maglaras
[J]. Queueing Systems, 2003, 43 : 43 - 80
[9] Learning Policies from Self-Play with Policy Gradients and MCTS Value Estimates
Soemers, Dennis J. N. J.
Piette, Eric
Stephenson, Matthew
Browne, Cameron
[J]. 2019 IEEE CONFERENCE ON GAMES (COG), 2019,
[10] Learning First-to-Spike Policies for Neuromorphic Control Using Policy Gradients
Rosenfeld, Bleema
Simeone, Osvaldo
Rajendran, Bipin
[J]. 2019 IEEE 20TH INTERNATIONAL WORKSHOP ON SIGNAL PROCESSING ADVANCES IN WIRELESS COMMUNICATIONS (SPAWC 2019), 2019,

← 1 2 3 4 5 →