Average cost temporal-difference learning

被引：0

作者：

Tsitsiklis, JN ^{[1
]}

Van Roy, B ^{[1
]}

机构：

[1] MIT, Informat & Decis Syst Lab, Cambridge, MA 02139 USA

来源：

PROCEEDINGS OF THE 36TH IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-5 | 1997年

关键词：

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We describe a variant of temporal-difference learning that approximates average and differential costs of an irreducible aperiodic Markov chain. Approximations are comprised of linear combinations of fixed basis functions whose weights are incrementally updated during a single endless trajectory of the Markov chain. We present results concerning convergence and the limit of convergence. We also provide a bound on the resulting approximation error that exhibits an interesting dependence on the "mixing time" of the Markov chain. The results parallel previous work by the authors, involving approximations of discounted cost-to-go.

引用

页码：498 / 502

页数：5

共 50 条

[11] Loosely Consistent Emphatic Temporal-Difference Learning
He, Jiamin
Che, Fengdi
Wan, Yi
Mahmood, A. Rupam
UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2023, 216 : 849 - 859
[12] Gradient Temporal-Difference Learning with Regularized Corrections
Ghiassian, Sina
Patterson, Andrew
Garg, Shivam
Gupta, Dhawal
White, Adam
White, Martha
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
[13] Relative Loss Bounds for Temporal-Difference Learning
Jürgen Forster
Manfred K. Warmuth
Machine Learning, 2003, 51 : 23 - 50
[14] Temporal-Difference Reinforcement Learning with Distributed Representations
Kurth-Nelson, Zeb
Redish, A. David
PLOS ONE, 2009, 4 (10):
[15] Nonlinear Distributional Gradient Temporal-Difference Learning
Qu, Chao
Mannor, Shie
Xu, Huan
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
[16] Gradient Temporal-Difference Learning with Regularized Corrections
Ghiassian, Sina
Patterson, Andrew
Garg, Shivam
Gupta, Dhawal
White, Adam
White, Martha
25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019), 2019,
[17] An analysis of temporal-difference learning with function approximation
Tsitsiklis, JN
VanRoy, B
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1997, 42 (05) : 674 - 690
[18] Relative loss bounds for temporal-difference learning
Forster, J
Warmuth, MK
MACHINE LEARNING, 2003, 51 (01) : 23 - 50
[19] Analysis of temporal-difference learning with function approximation
Tsitsiklis, JN
VanRoy, B
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 9: PROCEEDINGS OF THE 1996 CONFERENCE, 1997, 9 : 1075 - 1081
[20] Approximate value iteration and temporal-difference learning
de Farias, DP
Van Roy, B
IEEE 2000 ADAPTIVE SYSTEMS FOR SIGNAL PROCESSING, COMMUNICATIONS, AND CONTROL SYMPOSIUM - PROCEEDINGS, 2000, : 48 - 51

← 1 2 3 4 5 →