Representations for Stable Off-Policy Reinforcement Learning

被引：0

作者：

Ghosh, Dibya ^{[1
]}

Bellemare, Marc G. ^{[1
]}

机构：

[1] Google Res, Mountain View, CA 94043 USA

来源：

25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019) | 2019年

关键词：

FRAMEWORK;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Reinforcement learning with function approximation can be unstable and even divergent, especially when combined with off-policy learning and Bellman updates. In deep reinforcement learning, these issues have been dealt with empirically by adapting and regularizing the representation, in particular with auxiliary tasks. This suggests that representation learning may provide a means to guarantee stability. In this paper, we formally show that there are indeed nontrivial state representations under which the canonical TD algorithm is stable, even when learning off-policy. We analyze representation learning schemes that are based on the transition matrix of a policy, such as proto-value functions, along three axes: approximation error, stability, and ease of estimation. In the most general case, we show that a Schur basis provides convergence guarantees, but is difficult to estimate from samples. For a fixed reward function, we find that an orthogonal basis of the corresponding Krylov subspace is an even better choice. We conclude by empirically demonstrating that these stable representations can be learned using stochastic gradient descent, opening the door to improved techniques for representation learning with deep networks.

引用

页数：10

共 50 条

[41] Quasi-Stochastic Approximation and Off-Policy Reinforcement Learning
Bernstein, Andrey
Chen, Yue
Colombino, Marcello
Dall'Anese, Emiliano
Mehta, Prashant
Meyn, Sean
2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 5244 - 5251
[42] Distributed off-Policy Actor-Critic Reinforcement Learning with Policy Consensus
Zhang, Yan
Zavlanos, Michael M.
2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 4674 - 4679
[43] Trajectory-Aware Eligibility Traces for Off-Policy Reinforcement Learning
Daley, Brett
White, Martha
Amato, Christopher
Machado, Marlos C.
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202
[44] Model-free off-policy reinforcement learning in continuous environment
Wawrzynski, P
Pacut, A
2004 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-4, PROCEEDINGS, 2004, : 1091 - 1096
[45] Research on Experience Replay of Off-policy Deep Reinforcement Learning: A Review
Hu Z.-J.
Gao X.-G.
Wan K.-F.
Zhang L.-T.
Wang Q.-L.
Neretin E.
Zidonghua Xuebao/Acta Automatica Sinica, 2023, 49 (11): : 2237 - 2256
[46] VALUE-AWARE IMPORTANCE WEIGHTING FOR OFF-POLICY REINFORCEMENT LEARNING
De Asis, Kristopher
Graves, Eric
Sutton, Richard S.
CONFERENCE ON LIFELONG LEARNING AGENTS, VOL 232, 2023, 232 : 745 - 763
[47] Re-attentive experience replay in off-policy reinforcement learning
Wei Wei
Da Wang
Lin Li
Jiye Liang
Machine Learning, 2024, 113 : 2327 - 2349
[48] Re-attentive experience replay in off-policy reinforcement learning
Wei, Wei
Wang, Da
Li, Lin
Liang, Jiye
MACHINE LEARNING, 2024, 113 (05) : 2327 - 2349
[49] Reliability assessment of off-policy deep reinforcement learning: A benchmark for aerodynamics
Berger, Sandrine
Ramo, Andrea Arroyo
Guillet, Valentin
Lahire, Thibault
Martin, Brice
Jardin, Thierry
Rachelson, Emmanuel
DATA-CENTRIC ENGINEERING, 2024, 5
[50] Off-policy asymptotic and adaptive maximum entropy deep reinforcement learning
Huihui Zhang
Xu Han
International Journal of Machine Learning and Cybernetics, 2025, 16 (4) : 2417 - 2429

← 1 2 3 4 5 →