Average cost temporal-difference learning

被引:0
|
作者
Tsitsiklis, JN [1 ]
Van Roy, B [1 ]
机构
[1] MIT, Informat & Decis Syst Lab, Cambridge, MA 02139 USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We describe a variant of temporal-difference learning that approximates average and differential costs of an irreducible aperiodic Markov chain. Approximations are comprised of linear combinations of fixed basis functions whose weights are incrementally updated during a single endless trajectory of the Markov chain. We present results concerning convergence and the limit of convergence. We also provide a bound on the resulting approximation error that exhibits an interesting dependence on the "mixing time" of the Markov chain. The results parallel previous work by the authors, involving approximations of discounted cost-to-go.
引用
收藏
页码:498 / 502
页数:5
相关论文
共 50 条
  • [11] Loosely Consistent Emphatic Temporal-Difference Learning
    He, Jiamin
    Che, Fengdi
    Wan, Yi
    Mahmood, A. Rupam
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2023, 216 : 849 - 859
  • [12] Gradient Temporal-Difference Learning with Regularized Corrections
    Ghiassian, Sina
    Patterson, Andrew
    Garg, Shivam
    Gupta, Dhawal
    White, Adam
    White, Martha
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [13] Relative Loss Bounds for Temporal-Difference Learning
    Jürgen Forster
    Manfred K. Warmuth
    Machine Learning, 2003, 51 : 23 - 50
  • [14] Temporal-Difference Reinforcement Learning with Distributed Representations
    Kurth-Nelson, Zeb
    Redish, A. David
    PLOS ONE, 2009, 4 (10):
  • [15] Nonlinear Distributional Gradient Temporal-Difference Learning
    Qu, Chao
    Mannor, Shie
    Xu, Huan
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [16] Gradient Temporal-Difference Learning with Regularized Corrections
    Ghiassian, Sina
    Patterson, Andrew
    Garg, Shivam
    Gupta, Dhawal
    White, Adam
    White, Martha
    25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019), 2019,
  • [17] An analysis of temporal-difference learning with function approximation
    Tsitsiklis, JN
    VanRoy, B
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1997, 42 (05) : 674 - 690
  • [18] Relative loss bounds for temporal-difference learning
    Forster, J
    Warmuth, MK
    MACHINE LEARNING, 2003, 51 (01) : 23 - 50
  • [19] Analysis of temporal-difference learning with function approximation
    Tsitsiklis, JN
    VanRoy, B
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 9: PROCEEDINGS OF THE 1996 CONFERENCE, 1997, 9 : 1075 - 1081
  • [20] Approximate value iteration and temporal-difference learning
    de Farias, DP
    Van Roy, B
    IEEE 2000 ADAPTIVE SYSTEMS FOR SIGNAL PROCESSING, COMMUNICATIONS, AND CONTROL SYMPOSIUM - PROCEEDINGS, 2000, : 48 - 51