Smoothed Action Value Functions for Learning Gaussian Policies

被引:0
|
作者
Nachum, Ofir [1 ]
Norouzi, Mohammad [1 ]
Tucker, George [1 ]
Schuurmans, Dale [1 ,2 ]
机构
[1] Google Brain, Mountain View, CA 94043 USA
[2] Univ Alberta, Dept Comp Sci, Edmonton, AB, Canada
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
State-action value functions (i.e., Q-values) are ubiquitous in reinforcement learning (RL), giving rise to popular algorithms such as SARSA and Q-learning. We propose a new notion of action value defined by a Gaussian smoothed version of the expected Q-value. We show that such smoothed Q-values still satisfy a Bellman equation, making them learnable from experience sampled from an environment. Moreover, the gradients of expected reward with respect to the mean and covariance of a parameterized Gaussian policy can be recovered from the gradient and Hessian of the smoothed Q-value function. Based on these relationships, we develop new algorithms for training a Gaussian policy directly from a learned smoothed Q-value approximator. The approach is additionally amenable to proximal optimization by augmenting the objective with a penalty on KL-divergence from a previous policy. We find that the ability to learn both a mean and covariance during training leads to significantly improved results on standard continuous control benchmarks.
引用
收藏
页数:9
相关论文
共 50 条
  • [21] A NOTE ON SMOOTHED ESTIMATING FUNCTIONS
    THAVANESWARAN, A
    SINGH, J
    ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 1993, 45 (04) : 721 - 729
  • [22] Semi-parametric training of autoencoders with Gaussian kernel smoothed topology learning neural networks
    Zhiyang Xiang
    Changshou Deng
    Xueting Xiang
    Mali Yu
    Jing Xiong
    Neural Computing and Applications, 2020, 32 : 4933 - 4950
  • [23] Semi-parametric training of autoencoders with Gaussian kernel smoothed topology learning neural networks
    Xiang, Zhiyang
    Deng, Changshou
    Xiang, Xueting
    Yu, Mali
    Xiong, Jing
    NEURAL COMPUTING & APPLICATIONS, 2020, 32 (09): : 4933 - 4950
  • [24] Smoothed hat functions in subdivision
    Conti, C.
    Jetter, K.
    JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS, 2008, 221 (02) : 330 - 345
  • [25] Regularizing Action Policies for Smooth Control with Reinforcement Learning
    Mysore, Siddharth
    Mabsout, Bassel
    Mancuso, Renato
    Saenko, Kate
    2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 1810 - 1816
  • [26] Action Schema Networks: Generalised Policies with Deep Learning
    Toyer, Sam
    Trevizan, Felipe
    Thiebaux, Sylvie
    Xie, Lexing
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 6294 - 6301
  • [27] Gaussian Fluctuation for Smoothed Local Correlations in CUE
    Soshnikov, Alexander
    JOURNAL OF STATISTICAL PHYSICS, 2023, 190 (01)
  • [28] Gaussian Fluctuation for Smoothed Local Correlations in CUE
    Alexander Soshnikov
    Journal of Statistical Physics, 2023, 190
  • [29] Gaussian and logistic adaptations of smoothed safety first
    Haley M.R.
    Annals of Finance, 2014, 10 (2) : 333 - 345
  • [30] Learning Continuous Control Policies by Stochastic Value Gradients
    Heess, Nicolas
    Wayne, Greg
    Silver, David
    Lillicrap, Timothy
    Tassa, Yuval
    Erez, Tom
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28