Representations for Stable Off-Policy Reinforcement Learning

被引:0
|
作者
Ghosh, Dibya [1 ]
Bellemare, Marc G. [1 ]
机构
[1] Google Res, Mountain View, CA 94043 USA
关键词
FRAMEWORK;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Reinforcement learning with function approximation can be unstable and even divergent, especially when combined with off-policy learning and Bellman updates. In deep reinforcement learning, these issues have been dealt with empirically by adapting and regularizing the representation, in particular with auxiliary tasks. This suggests that representation learning may provide a means to guarantee stability. In this paper, we formally show that there are indeed nontrivial state representations under which the canonical TD algorithm is stable, even when learning off-policy. We analyze representation learning schemes that are based on the transition matrix of a policy, such as proto-value functions, along three axes: approximation error, stability, and ease of estimation. In the most general case, we show that a Schur basis provides convergence guarantees, but is difficult to estimate from samples. For a fixed reward function, we find that an orthogonal basis of the corresponding Krylov subspace is an even better choice. We conclude by empirically demonstrating that these stable representations can be learned using stochastic gradient descent, opening the door to improved techniques for representation learning with deep networks.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Quasi-Stochastic Approximation and Off-Policy Reinforcement Learning
    Bernstein, Andrey
    Chen, Yue
    Colombino, Marcello
    Dall'Anese, Emiliano
    Mehta, Prashant
    Meyn, Sean
    2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 5244 - 5251
  • [42] Distributed off-Policy Actor-Critic Reinforcement Learning with Policy Consensus
    Zhang, Yan
    Zavlanos, Michael M.
    2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 4674 - 4679
  • [43] Trajectory-Aware Eligibility Traces for Off-Policy Reinforcement Learning
    Daley, Brett
    White, Martha
    Amato, Christopher
    Machado, Marlos C.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202
  • [44] Model-free off-policy reinforcement learning in continuous environment
    Wawrzynski, P
    Pacut, A
    2004 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-4, PROCEEDINGS, 2004, : 1091 - 1096
  • [45] Research on Experience Replay of Off-policy Deep Reinforcement Learning: A Review
    Hu Z.-J.
    Gao X.-G.
    Wan K.-F.
    Zhang L.-T.
    Wang Q.-L.
    Neretin E.
    Zidonghua Xuebao/Acta Automatica Sinica, 2023, 49 (11): : 2237 - 2256
  • [46] VALUE-AWARE IMPORTANCE WEIGHTING FOR OFF-POLICY REINFORCEMENT LEARNING
    De Asis, Kristopher
    Graves, Eric
    Sutton, Richard S.
    CONFERENCE ON LIFELONG LEARNING AGENTS, VOL 232, 2023, 232 : 745 - 763
  • [47] Re-attentive experience replay in off-policy reinforcement learning
    Wei Wei
    Da Wang
    Lin Li
    Jiye Liang
    Machine Learning, 2024, 113 : 2327 - 2349
  • [48] Re-attentive experience replay in off-policy reinforcement learning
    Wei, Wei
    Wang, Da
    Li, Lin
    Liang, Jiye
    MACHINE LEARNING, 2024, 113 (05) : 2327 - 2349
  • [49] Reliability assessment of off-policy deep reinforcement learning: A benchmark for aerodynamics
    Berger, Sandrine
    Ramo, Andrea Arroyo
    Guillet, Valentin
    Lahire, Thibault
    Martin, Brice
    Jardin, Thierry
    Rachelson, Emmanuel
    DATA-CENTRIC ENGINEERING, 2024, 5
  • [50] Off-policy asymptotic and adaptive maximum entropy deep reinforcement learning
    Huihui Zhang
    Xu Han
    International Journal of Machine Learning and Cybernetics, 2025, 16 (4) : 2417 - 2429