Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates

被引:0
|
作者
Penedones, Hugo [1 ]
Riquelme, Carlos [2 ]
Vincent, Damien [2 ]
Maennel, Hartmut [2 ]
Mann, Timothy [1 ]
Barreto, Andre [1 ]
Gelly, Sylvain [2 ]
Neu, Gergely [3 ]
机构
[1] DeepMind, London, England
[2] Google Brain, Mountain View, CA 94043 USA
[3] Univ Pompeu Fabra, Barcelona, Spain
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider the core reinforcement-learning problem of on-policy value function approximation from a batch of trajectory data, and focus on various issues of Temporal Difference (TD) learning and Monte Carlo (MC) policy evaluation. The two methods are known to achieve complementary bias-variance trade-off properties, with TD tending to achieve lower variance but potentially higher bias. In this paper, we argue that the larger bias of TD can be a result of the amplification of local approximation errors. We address this by proposing an algorithm that adaptively switches between TD and MC in each state, thus mitigating the propagation of errors. Our method is based on learned confidence intervals that detect biases of TD estimates. We demonstrate in a variety of policy evaluation tasks that this simple adaptive algorithm performs competitively with the best approach in hindsight, suggesting that learned confidence intervals are a powerful technique for adapting policy evaluation to use TD or MC returns in a data-driven way.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] Nonlinear Distributional Gradient Temporal-Difference Learning
    Qu, Chao
    Mannor, Shie
    Xu, Huan
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [22] Gradient Temporal-Difference Learning with Regularized Corrections
    Ghiassian, Sina
    Patterson, Andrew
    Garg, Shivam
    Gupta, Dhawal
    White, Adam
    White, Martha
    25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019), 2019,
  • [23] An analysis of temporal-difference learning with function approximation
    Tsitsiklis, JN
    VanRoy, B
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1997, 42 (05) : 674 - 690
  • [24] Relative loss bounds for temporal-difference learning
    Forster, J
    Warmuth, MK
    MACHINE LEARNING, 2003, 51 (01) : 23 - 50
  • [25] Analysis of temporal-difference learning with function approximation
    Tsitsiklis, JN
    VanRoy, B
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 9: PROCEEDINGS OF THE 1996 CONFERENCE, 1997, 9 : 1075 - 1081
  • [26] Approximate value iteration and temporal-difference learning
    de Farias, DP
    Van Roy, B
    IEEE 2000 ADAPTIVE SYSTEMS FOR SIGNAL PROCESSING, COMMUNICATIONS, AND CONTROL SYMPOSIUM - PROCEEDINGS, 2000, : 48 - 51
  • [27] Target-Based Temporal-Difference Learning
    Lee, Donghwan
    He, Niao
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [28] New Versions of Gradient Temporal-Difference Learning
    Lee, Donghwan
    Lim, Han-Dong
    Park, Jihoon
    Choi, Okyong
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2023, 68 (08) : 5006 - 5013
  • [29] On Generalized Bellman Equations and Temporal-Difference Learning
    Yu, Huizhen
    Mahmood, Ashique Rupam
    Sutton, Richard S.
    ADVANCES IN ARTIFICIAL INTELLIGENCE, CANADIAN AI 2017, 2017, 10233 : 3 - 14
  • [30] Postponed Updates for Temporal-Difference Reinforcement Learning
    van Seijen, Harm
    Whiteson, Shimon
    2009 9TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, 2009, : 665 - +