Off-Policy Actor-Critic with Emphatic Weightings

被引：0

作者：

Graves, Eric ^{[1
]}

Imani, Ehsan ^{[1
]}

Kumaraswamy, Raksha ^{[1
]}

White, Martha ^{[1
]}

机构：

[1] Univ Alberta, Dept Comp Sci, Reinforcement Learning & Artificial Intelligence L, Edmonton, AB T6G 2E8, Canada

来源：

JOURNAL OF MACHINE LEARNING RESEARCH | 2023年 / 24卷

基金：

加拿大自然科学与工程研究理事会;

关键词：

off-policy learning; policy gradient; actor-critic; reinforcement learning;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

A variety of theoretically-sound policy gradient algorithms exist for the on-policy setting due to the policy gradient theorem, which provides a simplified form for the gradient. The off-policy setting, however, has been less clear due to the existence of multiple objectives and the lack of an explicit off-policy policy gradient theorem. In this work, we unify these objectives into one off-policy objective, and provide a policy gradient theorem for this unified objective. The derivation involves emphatic weightings and interest functions. We show multiple strategies to approximate the gradients, in an algorithm called Actor Critic with Emphatic weightings (ACE). We prove in a counterexample that previous (semi -gradient) off-policy actor-critic methods-particularly Off-Policy Actor-Critic (OffPAC) and Deterministic Policy Gradient (DPG)-converge to the wrong solution whereas ACE finds the optimal solution. We also highlight why these semigradient approaches can still perform well in practice, suggesting strategies for variance reduction in ACE. We empirically study several variants of ACE on two classic control environments and an image-based environment designed to illustrate the tradeoffs made by each gradient approximation. We find that by approximating the emphatic weightings directly, ACE performs as well as or better than OffPAC in all settings tested.

引用

页数：63

共 50 条

[11] Neural Network Compatible Off-Policy Natural Actor-Critic Algorithm
Diddigi, Raghuram Bharadwaj
Jain, Prateek
Prabuchandran, K. J.
Bhatnagar, Shalabh
[J]. 2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
[12] An Optimistic Approach to the Temporal Difference Error in Off-Policy Actor-Critic Algorithms
Saglam, Baturay
Mutlu, Furkan B.
Kozat, Suleyman S.
[J]. 2022 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2022, : 875 - 883
[13] Finite-Sample Analysis of Off-Policy Natural Actor-Critic Algorithm
Khodadadian, Sajad
Chen, Zaiwei
Maguluri, Siva Theja
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[14] Off-Policy Actor-Critic Structure for Optimal Control of Unknown Systems With Disturbances
Song, Ruizhuo
Lewis, Frank L.
Wei, Qinglai
Zhang, Huaguang
[J]. IEEE TRANSACTIONS ON CYBERNETICS, 2016, 46 (05) : 1041 - 1050
[15] Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
Haarnoja, Tuomas
Zhou, Aurick
Abbeel, Pieter
Levine, Sergey
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
[16] Improved Soft Actor-Critic: Mixing Prioritized Off-Policy Samples With On-Policy Experiences
Banerjee, Chayan
Chen, Zhiyong
Noman, Nasimul
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (03) : 3121 - 3129
[17] An Improved Off-Policy Actor-Critic Algorithm with Historical Behaviors Reusing for Robotic Control
Zhang, Huaqing
Ma, Hongbin
Jin, Ying
[J]. INTELLIGENT ROBOTICS AND APPLICATIONS (ICIRA 2022), PT IV, 2022, 13458 : 449 - 458
[18] A Multi-Agent Off-Policy Actor-Critic Algorithm for Distributed Reinforcement Learning
Suttle, Wesley
Yang, Zhuoran
Zhang, Kaiqing
Wang, Zhaoran
Basar, Tamer
Liu, Ji
[J]. IFAC PAPERSONLINE, 2020, 53 (02): : 1549 - 1554
[19] Finite-Sample Analysis of Off-Policy Natural Actor-Critic With Linear Function Approximation
Chen, Zaiwei
Khodadadian, Sajad
Maguluri, Siva Theja
[J]. IEEE CONTROL SYSTEMS LETTERS, 2022, 6 : 2611 - 2616
[20] Distributional Soft Actor-Critic: Off-Policy Reinforcement Learning for Addressing Value Estimation Errors
Duan, Jingliang
Guan, Yang
Li, Shengbo Eben
Ren, Yangang
Sun, Qi
Cheng, Bo
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (11) : 6584 - 6598

← 1 2 3 4 5 →