Average cost temporal-difference learning

被引:0
|
作者
Tsitsiklis, JN [1 ]
Van Roy, B [1 ]
机构
[1] MIT, Informat & Decis Syst Lab, Cambridge, MA 02139 USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We describe a variant of temporal-difference learning that approximates average and differential costs of an irreducible aperiodic Markov chain. Approximations are comprised of linear combinations of fixed basis functions whose weights are incrementally updated during a single endless trajectory of the Markov chain. We present results concerning convergence and the limit of convergence. We also provide a bound on the resulting approximation error that exhibits an interesting dependence on the "mixing time" of the Markov chain. The results parallel previous work by the authors, involving approximations of discounted cost-to-go.
引用
收藏
页码:498 / 502
页数:5
相关论文
共 50 条
  • [21] Target-Based Temporal-Difference Learning
    Lee, Donghwan
    He, Niao
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [22] New Versions of Gradient Temporal-Difference Learning
    Lee, Donghwan
    Lim, Han-Dong
    Park, Jihoon
    Choi, Okyong
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2023, 68 (08) : 5006 - 5013
  • [23] On Generalized Bellman Equations and Temporal-Difference Learning
    Yu, Huizhen
    Mahmood, Ashique Rupam
    Sutton, Richard S.
    ADVANCES IN ARTIFICIAL INTELLIGENCE, CANADIAN AI 2017, 2017, 10233 : 3 - 14
  • [24] Postponed Updates for Temporal-Difference Reinforcement Learning
    van Seijen, Harm
    Whiteson, Shimon
    2009 9TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, 2009, : 665 - +
  • [25] On the convergence of temporal-difference learning with linear function approximation
    Tadic, V
    MACHINE LEARNING, 2001, 42 (03) : 241 - 267
  • [26] Optimal Active Fault Diagnosis by Temporal-Difference Learning
    Skach, Jan
    Puncochar, Ivo
    Lewis, Frank L.
    2016 IEEE 55TH CONFERENCE ON DECISION AND CONTROL (CDC), 2016, : 2146 - 2151
  • [27] Temporal-Difference Learning with Sampling Baseline for Image Captioning
    Chen, Hui
    Ding, Guiguang
    Zhao, Sicheng
    Han, Jungong
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 6706 - 6713
  • [28] On the Convergence of Temporal-Difference Learning with Linear Function Approximation
    Vladislav Tadić
    Machine Learning, 2001, 42 : 241 - 267
  • [29] Neural Temporal-Difference Learning Converges to Global Optima
    Cai, Qi
    Yang, Zhuoran
    Lee, Jason D.
    Wang, Zhaoran
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [30] Temporal-Difference Networks with History
    Tanner, Brian
    Sutton, Richard S.
    19TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-05), 2005, : 865 - 870