Average cost temporal-difference learning

被引:0
|
作者
Tsitsiklis, JN [1 ]
Van Roy, B [1 ]
机构
[1] MIT, Informat & Decis Syst Lab, Cambridge, MA 02139 USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We describe a variant of temporal-difference learning that approximates average and differential costs of an irreducible aperiodic Markov chain. Approximations are comprised of linear combinations of fixed basis functions whose weights are incrementally updated during a single endless trajectory of the Markov chain. We present results concerning convergence and the limit of convergence. We also provide a bound on the resulting approximation error that exhibits an interesting dependence on the "mixing time" of the Markov chain. The results parallel previous work by the authors, involving approximations of discounted cost-to-go.
引用
收藏
页码:498 / 502
页数:5
相关论文
共 50 条
  • [1] Average cost temporal-difference learning
    Lab. for Info. and Decision Systems, Massachusetts Inst. of Technology, Room 35-209, 77 Massachusetts Avenue, Cambridge, MA 02139-4307, United States
    Automatica, 11 (1799-1808):
  • [2] Average cost temporal-difference learning
    Tsitsiklis, JN
    Van Roy, B
    AUTOMATICA, 1999, 35 (11) : 1799 - 1808
  • [3] On Average Versus Discounted Reward Temporal-Difference Learning
    John N. Tsitsiklis
    Benjamin Van Roy
    Machine Learning, 2002, 49 : 179 - 191
  • [4] On average versus discounted reward temporal-difference learning
    Tsitsiklis, JN
    Van Roy, B
    MACHINE LEARNING, 2002, 49 (2-3) : 179 - 191
  • [5] Temporal-difference learning and applications in finance
    Van Roy, B
    COMPUTATIONAL FINANCE 1999, 2000, : 447 - 461
  • [6] True Online Temporal-Difference Learning
    van Seijen, Harm
    Mahmood, A. Rupam
    Pilarski, Patrick M.
    Machado, Marlos C.
    Sutton, Richard S.
    JOURNAL OF MACHINE LEARNING RESEARCH, 2016, 17
  • [7] An Analysis of Quantile Temporal-Difference Learning
    Rowland, Mark
    Munos, Remi
    Azar, Mohammad Gheshlaghi
    Tang, Yunhao
    Ostrovski, Georg
    Harutyunyan, Anna
    Tuyls, Karl
    Bellemare, Marc G.
    Dabney, Will
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
  • [8] Temporal-Difference Learning for Online Reachability Analysis
    Akametalu, Anayo K.
    Tomlin, Claire J.
    2015 EUROPEAN CONTROL CONFERENCE (ECC), 2015, : 2508 - 2513
  • [9] Advanced Temporal-Difference Learning for Intrusion Detection
    Sukhanov, A., V
    Kovalev, S. M.
    Styskala, V
    IFAC PAPERSONLINE, 2015, 48 (04): : 43 - 48
  • [10] On Generalized Bellman Equations and Temporal-Difference Learning
    Yu, Huizhen
    Mahmood, A. Rupam
    Sutton, Richard S.
    JOURNAL OF MACHINE LEARNING RESEARCH, 2018, 19