Characterizing the Gap Between Actor-Critic and Policy Gradient

被引:0
|
作者
Wen, Junfeng [1 ]
Kumar, Saurabh [2 ]
Gummadi, Ramki [3 ]
Schuurmans, Dale [1 ,3 ]
机构
[1] Univ Alberta, Dept Comp Sci, Edmonton, AB, Canada
[2] Stanford Univ, Stanford, CA 94305 USA
[3] Google Brain, Mountain View, CA USA
基金
加拿大自然科学与工程研究理事会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Actor-critic (AC) methods are ubiquitous in reinforcement learning. Although it is understood that AC methods are closely related to policy gradient (PG), their precise connection has not been fully characterized previously. In this paper, we explain the gap between AC and PG methods by identifying the exact adjustment to the AC objective/gradient that recovers the true policy gradient of the cumulative reward objective (PG). Furthermore, by viewing the AC method as a two-player Stackelberg game between the actor and critic, we show that the Stackelberg policy gradient can be recovered as a special case of our more general analysis. Based on these results, we develop practical algorithms, Residual Actor-Critic and Stackelberg Actor-Critic, for estimating the correction between AC and PG and use these to modify the standard AC algorithm. Experiments on popular tabular and continuous environments show the proposed corrections can improve both the sample efficiency and final performance of existing AC methods.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Bayesian Policy Gradient and Actor-Critic Algorithms
    Ghavamzadeh, Mohammad
    Engel, Yaakov
    Valko, Michal
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2016, 17
  • [2] Policy-Gradient Based Actor-Critic Algorithms
    Awate, Yogesh P.
    [J]. PROCEEDINGS OF THE 2009 WRI GLOBAL CONGRESS ON INTELLIGENT SYSTEMS, VOL III, 2009, : 505 - 509
  • [3] Soft-Robust Actor-Critic Policy-Gradient
    Derman, Esther
    Mankowitz, Daniel J.
    Mann, Timothy A.
    Mannor, Shie
    [J]. UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2018, : 208 - 218
  • [4] Actor-critic algorithm with incremental dual natural policy gradient
    [J]. 2017, Editorial Board of Journal on Communications (38):
  • [5] Beyond the Policy Gradient Theorem for Efficient Policy Updates in Actor-Critic Algorithms
    Laroche, Romain
    des Combes, Remi Tachet
    [J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151 : 5658 - 5688
  • [6] Noisy Importance Sampling Actor-Critic: An Off-Policy Actor-Critic With Experience Replay
    Tasfi, Norman
    Capretz, Miriam
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [7] Actor-Critic Learning Control With Regularization and Feature Selection in Policy Gradient Estimation
    Li, Luntong
    Li, Dazi
    Song, Tianheng
    Xu, Xin
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (03) : 1217 - 1227
  • [8] Policy Gradient and Actor-Critic Learning in Continuous Time and Space: Theory and Algorithms
    Jia, Yanwei
    Zhou, Xun Yu
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23
  • [9] Characterizing Motor Control of Mastication With Soft Actor-Critic
    Abdi, Amir H.
    Sagl, Benedikt
    Srungarapu, Venkata P.
    Stavness, Ian
    Prisman, Eitan
    Abolmaesumi, Purang
    Fels, Sidney
    [J]. FRONTIERS IN HUMAN NEUROSCIENCE, 2020, 14
  • [10] Algorithms for Variance Reduction in a Policy-Gradient Based Actor-Critic Framework
    Awate, Yogesh P.
    [J]. ADPRL: 2009 IEEE SYMPOSIUM ON ADAPTIVE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING, 2009, : 130 - 136