Expected Policy Gradients for Reinforcement Learning

被引:0
|
作者
Ciosek, Kamil [1 ]
Whiteson, Shimon [2 ]
机构
[1] Microsoft Res Cambridge, 21 Stn Rd, Cambridge CB1 2FB, England
[2] Univ Oxford, Dept Comp Sci, Wolfson Bldg,Parks Rd, Oxford OX1 3QD, England
基金
欧洲研究理事会;
关键词
policy gradients; exploration; bounded actions; reinforcement learning; Markov decision process (MDP);
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose expected policy gradients (EPG), which unify stochastic policy gradients (SPG) and deterministic policy gradients (DPG) for reinforcement learning. Inspired by expected sarsa, EPG integrates (or sums) across actions when estimating the gradient, instead of relying only on the action in the sampled trajectory. For continuous action spaces, we first derive a practical result for Gaussian policies and quadratic critics and then extend it to a universal analytical method, covering a broad class of actors and critics, including Gaussian, exponential families, and policies with bounded support. For Gaussian policies, we introduce an exploration method that uses covariance proportional to eH, where H is the scaled Hessian of the critic with respect to the actions. For discrete action spaces, we derive a variant of EPG based on softmax policies. We also establish a new general policy gradient theorem, of which the stochastic and deterministic policy gradient theorems are special cases. Furthermore, we prove that EPG reduces the variance of the gradient estimates without requiring deterministic policies and with little computational overhead. Finally, we provide an extensive experimental evaluation of EPG and show that it outperforms existing approaches on multiple challenging control domains.
引用
收藏
页数:51
相关论文
共 50 条
  • [21] Policy Reuse in Deep Reinforcement Learning
    Glatt, Ruben
    Helena, Anna
    Costa, Reali
    THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4929 - 4930
  • [22] Evolving Constrained Reinforcement Learning Policy
    Hu, Chengpeng
    Pei, Jiyuan
    Liu, Jialin
    Yao, Xin
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [23] Minimizing Expected Loss for Risk-Avoiding Reinforcement Learning
    Yeh, Jung-Jung
    Kuo, Tsung-Ting
    Chen, William
    Lin, Shou-De
    2014 INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2014, : 11 - 17
  • [24] On-policy concurrent reinforcement learning
    Banerjee, B
    Sen, S
    Peng, J
    JOURNAL OF EXPERIMENTAL & THEORETICAL ARTIFICIAL INTELLIGENCE, 2004, 16 (04) : 245 - 260
  • [25] Policy gradient fuzzy reinforcement learning
    Wang, XN
    Xu, X
    He, HG
    PROCEEDINGS OF THE 2004 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2004, : 992 - 995
  • [26] On Stable Profit Sharing Reinforcement Learning with Expected Failure Probability
    Mizuno, Daisuke
    Miyazaki, Kazuteru
    Kobayashi, Hiroaki
    BIOLOGICALLY INSPIRED COGNITIVE ARCHITECTURES 2018, 2019, 848 : 228 - 233
  • [27] Estimating the Maximum Expected Value in Continuous Reinforcement Learning Problems
    D'Eramo, Carlo
    Nuara, Alessandro
    Pirotta, Matteo
    Restelli, Marcello
    THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 1840 - 1846
  • [28] Learning without Gradients: Multi-Agent Reinforcement Learning approach to optimization
    Morcos, Amir
    Man, Hong
    West, Aaron
    Maguire, Brian
    ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING IN DEFENSE APPLICATIONS IV, 2022, 12276
  • [29] Improving performance of deep learning models with axiomatic attribution priors and expected gradients
    Gabriel Erion
    Joseph D. Janizek
    Pascal Sturmfels
    Scott M. Lundberg
    Su-In Lee
    Nature Machine Intelligence, 2021, 3 : 620 - 631
  • [30] Action control, forward models and expected rewards: representations in reinforcement learning
    Rusanen, Anna-Mari
    Lappi, Otto
    Kuokkanen, Jesse
    Pekkanen, Jami
    SYNTHESE, 2021, 199 (5-6) : 14017 - 14033