Smoothed Action Value Functions for Learning Gaussian Policies

被引：0

作者：

Nachum, Ofir ^{[1
]}

Norouzi, Mohammad ^{[1
]}

Tucker, George ^{[1
]}

Schuurmans, Dale ^{[1
,2
]}

机构：

[1] Google Brain, Mountain View, CA 94043 USA

[2] Univ Alberta, Dept Comp Sci, Edmonton, AB, Canada

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80 | 2018年 / 80卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

State-action value functions (i.e., Q-values) are ubiquitous in reinforcement learning (RL), giving rise to popular algorithms such as SARSA and Q-learning. We propose a new notion of action value defined by a Gaussian smoothed version of the expected Q-value. We show that such smoothed Q-values still satisfy a Bellman equation, making them learnable from experience sampled from an environment. Moreover, the gradients of expected reward with respect to the mean and covariance of a parameterized Gaussian policy can be recovered from the gradient and Hessian of the smoothed Q-value function. Based on these relationships, we develop new algorithms for training a Gaussian policy directly from a learned smoothed Q-value approximator. The approach is additionally amenable to proximal optimization by augmenting the objective with a penalty on KL-divergence from a previous policy. We find that the ability to learn both a mean and covariance during training leads to significantly improved results on standard continuous control benchmarks.

引用

页数：9

共 50 条

[31] Learning and smoothed analysis
Microsoft Research, New England, United States
不详
不详
Proc. Annu. IEEE Symp. Found. Comput. Sci. FOCS, 1600, (395-404):
[32] Learning and smoothed analysis
Kalai, Adam Tauman
Samorodnitsky, Alex
Teng, Shang-Hua
2009 50TH ANNUAL IEEE SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE: FOCS 2009, PROCEEDINGS, 2009, : 395 - 404
[33] Teacher-directed learning with Gaussian and sigmoid activation functions
Kamimura, R
NEURAL INFORMATION PROCESSING, 2004, 3316 : 530 - 536
[34] Frequency Effects in Action Versus Value Learning
Don, Hilary J.
Worthy, Darrell A.
JOURNAL OF EXPERIMENTAL PSYCHOLOGY-LEARNING MEMORY AND COGNITION, 2022, 48 (09) : 1311 - 1327
[35] Sparse Approximations to Value Functions in Reinforcement Learning
Jakab, Hunor S.
Csato, Lehel
ARTIFICIAL NEURAL NETWORKS, 2015, : 295 - 314
[36] Multiagent Reinforcement Learning With Unshared Value Functions
Hu, Yujing
Gao, Yang
An, Bo
IEEE TRANSACTIONS ON CYBERNETICS, 2015, 45 (04) : 647 - 662
[37] Brazilian Socio-Environmental Policies and the Learning of a New Action
Silva Oliveira, Anderson Eduardo
DESENVOLVIMENTO E MEIO AMBIENTE, 2011, 23 : 133 - 148
[38] Discrete Action On-Policy Learning with Action-Value Critic
Yue, Yuguang
Tang, Yunhao
Yin, Mingzhang
Zhou, Mingyuan
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108 : 1977 - 1986
[39] The value of interest rate stabilization policies when agents are learning
Duffy, John
Xiao, Wei
JOURNAL OF MONEY CREDIT AND BANKING, 2007, 39 (08) : 2041 - 2056
[40] APPROXIMATING PALEY-WIENER FUNCTIONS BY SMOOTHED STEP FUNCTIONS
BEATY, MG
DODSON, MM
HIGGINS, JR
JOURNAL OF APPROXIMATION THEORY, 1994, 78 (03) : 433 - 445

← 1 2 3 4 5 →