Characterizing the Gap Between Actor-Critic and Policy Gradient

被引：0

作者：

Wen, Junfeng ^{[1
]}

Kumar, Saurabh ^{[2
]}

Gummadi, Ramki ^{[3
]}

Schuurmans, Dale ^{[1
,3
]}

机构：

[1] Univ Alberta, Dept Comp Sci, Edmonton, AB, Canada

[2] Stanford Univ, Stanford, CA 94305 USA

[3] Google Brain, Mountain View, CA USA

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139 | 2021年 / 139卷

基金：

加拿大自然科学与工程研究理事会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Actor-critic (AC) methods are ubiquitous in reinforcement learning. Although it is understood that AC methods are closely related to policy gradient (PG), their precise connection has not been fully characterized previously. In this paper, we explain the gap between AC and PG methods by identifying the exact adjustment to the AC objective/gradient that recovers the true policy gradient of the cumulative reward objective (PG). Furthermore, by viewing the AC method as a two-player Stackelberg game between the actor and critic, we show that the Stackelberg policy gradient can be recovered as a special case of our more general analysis. Based on these results, we develop practical algorithms, Residual Actor-Critic and Stackelberg Actor-Critic, for estimating the correction between AC and PG and use these to modify the standard AC algorithm. Experiments on popular tabular and continuous environments show the proposed corrections can improve both the sample efficiency and final performance of existing AC methods.

引用

页数：11

共 50 条

[1] Bayesian Policy Gradient and Actor-Critic Algorithms
Ghavamzadeh, Mohammad
Engel, Yaakov
Valko, Michal
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2016, 17
[2] Policy-Gradient Based Actor-Critic Algorithms
Awate, Yogesh P.
[J]. PROCEEDINGS OF THE 2009 WRI GLOBAL CONGRESS ON INTELLIGENT SYSTEMS, VOL III, 2009, : 505 - 509
[3] Soft-Robust Actor-Critic Policy-Gradient
Derman, Esther
Mankowitz, Daniel J.
Mann, Timothy A.
Mannor, Shie
[J]. UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2018, : 208 - 218
[4] Actor-critic algorithm with incremental dual natural policy gradient
[J]. 2017, Editorial Board of Journal on Communications (38):
[5] Beyond the Policy Gradient Theorem for Efficient Policy Updates in Actor-Critic Algorithms
Laroche, Romain
des Combes, Remi Tachet
[J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151 : 5658 - 5688
[6] Noisy Importance Sampling Actor-Critic: An Off-Policy Actor-Critic With Experience Replay
Tasfi, Norman
Capretz, Miriam
[J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
[7] Actor-Critic Learning Control With Regularization and Feature Selection in Policy Gradient Estimation
Li, Luntong
Li, Dazi
Song, Tianheng
Xu, Xin
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (03) : 1217 - 1227
[8] Policy Gradient and Actor-Critic Learning in Continuous Time and Space: Theory and Algorithms
Jia, Yanwei
Zhou, Xun Yu
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23
[9] Characterizing Motor Control of Mastication With Soft Actor-Critic
Abdi, Amir H.
Sagl, Benedikt
Srungarapu, Venkata P.
Stavness, Ian
Prisman, Eitan
Abolmaesumi, Purang
Fels, Sidney
[J]. FRONTIERS IN HUMAN NEUROSCIENCE, 2020, 14
[10] Algorithms for Variance Reduction in a Policy-Gradient Based Actor-Critic Framework
Awate, Yogesh P.
[J]. ADPRL: 2009 IEEE SYMPOSIUM ON ADAPTIVE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING, 2009, : 130 - 136

← 1 2 3 4 5 →