Variance Penalized On-Policy and Off-Policy Actor-Critic

被引：0

作者：

Jain, Arushi ^{[1
,2
]}

Patil, Gandharv ^{[1
,2
]}

Jain, Ayush ^{[1
,2
]}

Khetarpa, Khimya ^{[1
,2
]}

Precup, Doina ^{[1
,2
,3
]}

机构：

[1] McGill Univ, Montreal, PQ, Canada

[2] Mila, Montreal, PQ, Canada

[3] Google DeepMind, Montreal, PQ, Canada

来源：

THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2021年 / 35卷

关键词：

MARKOV DECISION-PROCESSES;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Reinforcement learning algorithms are typically geared towards optimizing the expected return of an agent. However, in many practical applications, low variance in the return is desired to ensure the reliability of an algorithm. In this paper, we propose on-policy and off-policy actor-critic algorithms that optimize a performance criterion involving both mean and variance in the return. Previous work uses the second moment of return to estimate the variance indirectly. Instead, we use a much simpler recently proposed direct variance estimator which updates the estimates incrementally using temporal difference methods. Using the variance-penalized criterion, we guarantee the convergence of our algorithm to locally optimal policies for finite state action Markov decision processes. We demonstrate the utility of our algorithm in tabular and continuous MuJoCo domains. Our approach not only performs on par with actor-critic and prior variance-penalization baselines in terms of expected return, but also generates trajectories which have lower variance in the return.

引用

页码：7899 / 7907

页数：9

共 50 条

[1] Improved Soft Actor-Critic: Mixing Prioritized Off-Policy Samples With On-Policy Experiences
Banerjee, Chayan
Chen, Zhiyong
Noman, Nasimul
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (03) : 3121 - 3129
[2] Generalized Off-Policy Actor-Critic
Zhang, Shangtong
Boehmer, Wendelin
Whiteson, Shimon
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[3] Off-Policy Actor-critic for Recommender Systems
Chen, Minmin
Xu, Can
Gatto, Vince
Jain, Devanshu
Kumar, Aviral
Chi, Ed
[J]. PROCEEDINGS OF THE 16TH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2022, 2022, : 338 - 349
[4] Off-Policy Actor-Critic with Emphatic Weightings
Graves, Eric
Imani, Ehsan
Kumaraswamy, Raksha
White, Martha
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
[5] Meta attention for Off-Policy Actor-Critic
Huang, Jiateng
Huang, Wanrong
Lan, Long
Wu, Dan
[J]. NEURAL NETWORKS, 2023, 163 : 86 - 96
[6] Noisy Importance Sampling Actor-Critic: An Off-Policy Actor-Critic With Experience Replay
Tasfi, Norman
Capretz, Miriam
[J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
[7] Distributed off-Policy Actor-Critic Reinforcement Learning with Policy Consensus
Zhang, Yan
Zavlanos, Michael M.
[J]. 2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 4674 - 4679
[8] Boosting On-Policy Actor-Critic With Shallow Updates in Critic
Li, Luntong
Zhu, Yuanheng
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
[9] Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality
Xu, Tengyu
Yang, Zhuoran
Wang, Zhaoran
Liang, Yingbin
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[10] Robust Offline Actor-Critic With On-policy Regularized Policy Evaluation
Shuo Cao
Xuesong Wang
Yuhu Cheng
[J]. IEEE/CAA Journal of Automatica Sinica., 2024, 11 (12) - 2511

← 1 2 3 4 5 →