Smoothed Action Value Functions for Learning Gaussian Policies

被引：0

作者：

Nachum, Ofir ^{[1
]}

Norouzi, Mohammad ^{[1
]}

Tucker, George ^{[1
]}

Schuurmans, Dale ^{[1
,2
]}

机构：

[1] Google Brain, Mountain View, CA 94043 USA

[2] Univ Alberta, Dept Comp Sci, Edmonton, AB, Canada

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80 | 2018年 / 80卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

State-action value functions (i.e., Q-values) are ubiquitous in reinforcement learning (RL), giving rise to popular algorithms such as SARSA and Q-learning. We propose a new notion of action value defined by a Gaussian smoothed version of the expected Q-value. We show that such smoothed Q-values still satisfy a Bellman equation, making them learnable from experience sampled from an environment. Moreover, the gradients of expected reward with respect to the mean and covariance of a parameterized Gaussian policy can be recovered from the gradient and Hessian of the smoothed Q-value function. Based on these relationships, we develop new algorithms for training a Gaussian policy directly from a learned smoothed Q-value approximator. The approach is additionally amenable to proximal optimization by augmenting the objective with a penalty on KL-divergence from a previous policy. We find that the ability to learn both a mean and covariance during training leads to significantly improved results on standard continuous control benchmarks.

引用

页数：9

共 50 条

[21] A NOTE ON SMOOTHED ESTIMATING FUNCTIONS
THAVANESWARAN, A
SINGH, J
ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 1993, 45 (04) : 721 - 729
[22] Semi-parametric training of autoencoders with Gaussian kernel smoothed topology learning neural networks
Zhiyang Xiang
Changshou Deng
Xueting Xiang
Mali Yu
Jing Xiong
Neural Computing and Applications, 2020, 32 : 4933 - 4950
[23] Semi-parametric training of autoencoders with Gaussian kernel smoothed topology learning neural networks
Xiang, Zhiyang
Deng, Changshou
Xiang, Xueting
Yu, Mali
Xiong, Jing
NEURAL COMPUTING & APPLICATIONS, 2020, 32 (09): : 4933 - 4950
[24] Smoothed hat functions in subdivision
Conti, C.
Jetter, K.
JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS, 2008, 221 (02) : 330 - 345
[25] Regularizing Action Policies for Smooth Control with Reinforcement Learning
Mysore, Siddharth
Mabsout, Bassel
Mancuso, Renato
Saenko, Kate
2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 1810 - 1816
[26] Action Schema Networks: Generalised Policies with Deep Learning
Toyer, Sam
Trevizan, Felipe
Thiebaux, Sylvie
Xie, Lexing
THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 6294 - 6301
[27] Gaussian Fluctuation for Smoothed Local Correlations in CUE
Soshnikov, Alexander
JOURNAL OF STATISTICAL PHYSICS, 2023, 190 (01)
[28] Gaussian Fluctuation for Smoothed Local Correlations in CUE
Alexander Soshnikov
Journal of Statistical Physics, 2023, 190
[29] Gaussian and logistic adaptations of smoothed safety first
Haley M.R.
Annals of Finance, 2014, 10 (2) : 333 - 345
[30] Learning Continuous Control Policies by Stochastic Value Gradients
Heess, Nicolas
Wayne, Greg
Silver, David
Lillicrap, Timothy
Tassa, Yuval
Erez, Tom
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28

← 1 2 3 4 5 →