Average cost temporal-difference learning

被引：0

作者：

Tsitsiklis, JN ^{[1
]}

Van Roy, B ^{[1
]}

机构：

[1] MIT, Informat & Decis Syst Lab, Cambridge, MA 02139 USA

来源：

PROCEEDINGS OF THE 36TH IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-5 | 1997年

关键词：

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We describe a variant of temporal-difference learning that approximates average and differential costs of an irreducible aperiodic Markov chain. Approximations are comprised of linear combinations of fixed basis functions whose weights are incrementally updated during a single endless trajectory of the Markov chain. We present results concerning convergence and the limit of convergence. We also provide a bound on the resulting approximation error that exhibits an interesting dependence on the "mixing time" of the Markov chain. The results parallel previous work by the authors, involving approximations of discounted cost-to-go.

引用

页码：498 / 502

页数：5

共 50 条

[1] Average cost temporal-difference learning
Lab. for Info. and Decision Systems, Massachusetts Inst. of Technology, Room 35-209, 77 Massachusetts Avenue, Cambridge, MA 02139-4307, United States
Automatica, 11 (1799-1808):
[2] Average cost temporal-difference learning
Tsitsiklis, JN
Van Roy, B
AUTOMATICA, 1999, 35 (11) : 1799 - 1808
[3] On Average Versus Discounted Reward Temporal-Difference Learning
John N. Tsitsiklis
Benjamin Van Roy
Machine Learning, 2002, 49 : 179 - 191
[4] On average versus discounted reward temporal-difference learning
Tsitsiklis, JN
Van Roy, B
MACHINE LEARNING, 2002, 49 (2-3) : 179 - 191
[5] Temporal-difference learning and applications in finance
Van Roy, B
COMPUTATIONAL FINANCE 1999, 2000, : 447 - 461
[6] True Online Temporal-Difference Learning
van Seijen, Harm
Mahmood, A. Rupam
Pilarski, Patrick M.
Machado, Marlos C.
Sutton, Richard S.
JOURNAL OF MACHINE LEARNING RESEARCH, 2016, 17
[7] An Analysis of Quantile Temporal-Difference Learning
Rowland, Mark
Munos, Remi
Azar, Mohammad Gheshlaghi
Tang, Yunhao
Ostrovski, Georg
Harutyunyan, Anna
Tuyls, Karl
Bellemare, Marc G.
Dabney, Will
JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
[8] Temporal-Difference Learning for Online Reachability Analysis
Akametalu, Anayo K.
Tomlin, Claire J.
2015 EUROPEAN CONTROL CONFERENCE (ECC), 2015, : 2508 - 2513
[9] Advanced Temporal-Difference Learning for Intrusion Detection
Sukhanov, A., V
Kovalev, S. M.
Styskala, V
IFAC PAPERSONLINE, 2015, 48 (04): : 43 - 48
[10] On Generalized Bellman Equations and Temporal-Difference Learning
Yu, Huizhen
Mahmood, A. Rupam
Sutton, Richard S.
JOURNAL OF MACHINE LEARNING RESEARCH, 2018, 19

← 1 2 3 4 5 →