CONVERGENCE ANALYSIS ON TEMPORAL DIFFERENCE LEARNING

被引：0

作者：

Leng, Jinsong ^{[1
]}

Jain, Lakhmi ^{[1
]}

Fyfe, Colin

机构：

[1] Univ S Australia, Sch Elect & Informat Engn, Mawson Lakes, SA 5095, Australia

来源：

INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL | 2009年 / 5卷 / 04期

关键词：

Temporal difference learning; Agent; Convergence analysis; APPROXIMATION;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Learning to act in an. uncertain environment without external instruction is considered as one of the fundamental features of intelligence. Temporal difference (TD) learning is an incremental learning approach and has been widely used in various application domains. Utilising eligibility traces is an important mechanism in enhancing leaning ability. For large, stochastic and dynamic systems, however, the TD method suffers,from two problems: the state space grows exponentially with the curse of dimensionality and there is a lack of methodology to analyse the convergence and sensitivity of TD algorithms. Measuring learning performance and analysing sensitivity of parameters are very difficult and expensive, and such performance metrics are obtained only by running an extensive set of experiments with different parameter values. In this paper, convergence is investigated by performance metrics, which is obtained through simulating a game of soccer. Sarsa(lambda) learning control algorithm, in conjunction with a linear function approximation technique known as the coding, is used to help soccer agents learn the optimal control processes. This paper proposes a methodology for finding the optimal parameter values to improve the quality of convergence.

引用

页码：913 / 922

页数：10

共 50 条

[1] On the Distributional Convergence of Temporal Difference Learning
Dai, Jie
Chen, Xuguang
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, ECML PKDD 2023, PT IV, 2023, 14172 : 439 - 454
[2] A Method of Accelerating the Convergence of Temporal Difference Learning
He B.
Liu Q.
Zhang L.-L.
Shi S.-M.
Chen H.-M.
Yan Y.
Zidonghua Xuebao/Acta Automatica Sinica, 2021, 47 (07): : 1679 - 1688
[3] On the Convergence of Temporal-Difference Learning with Linear Function Approximation
Vladislav Tadić
Machine Learning, 2001, 42 : 241 - 267
[4] Convergence of model-based temporal difference learning for control
Van Hasselt, Hado
Wiering, Marco A.
2007 IEEE INTERNATIONAL SYMPOSIUM ON APPROXIMATE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING, 2007, : 60 - +
[5] On the convergence of temporal-difference learning with linear function approximation
Tadic, V
MACHINE LEARNING, 2001, 42 (03) : 241 - 267
[6] Analysis of experience replay in temporal difference learning
Cichosz, PawelL
Cybernetics and Systems, 30 (05): : 341 - 363
[7] An analysis of experience replay in temporal difference learning
Cichosz, P
CYBERNETICS AND SYSTEMS, 1999, 30 (05) : 341 - 363
[8] An Analysis of Quantile Temporal-Difference Learning
Rowland, Mark
Munos, Remi
Azar, Mohammad Gheshlaghi
Tang, Yunhao
Ostrovski, Georg
Harutyunyan, Anna
Tuyls, Karl
Bellemare, Marc G.
Dabney, Will
JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
[9] On the mean-square rate of convergence of temporal-difference learning algorithms
Tadic, VB
PROCEEDINGS OF THE 2002 AMERICAN CONTROL CONFERENCE, VOLS 1-6, 2002, 1-6 : 1454 - 1459
[10] An Exemplar Test Problem on Parameter Convergence Analysis of Temporal Difference Algorithms
Brown, Martin
Tutsoy, Onder
PROCEEDINGS OF THE 10TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA 2012), 2012, : 2925 - 2930

← 1 2 3 4 5 →