CONVERGENCE ANALYSIS ON TEMPORAL DIFFERENCE LEARNING

被引:0
|
作者
Leng, Jinsong [1 ]
Jain, Lakhmi [1 ]
Fyfe, Colin
机构
[1] Univ S Australia, Sch Elect & Informat Engn, Mawson Lakes, SA 5095, Australia
关键词
Temporal difference learning; Agent; Convergence analysis; APPROXIMATION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning to act in an. uncertain environment without external instruction is considered as one of the fundamental features of intelligence. Temporal difference (TD) learning is an incremental learning approach and has been widely used in various application domains. Utilising eligibility traces is an important mechanism in enhancing leaning ability. For large, stochastic and dynamic systems, however, the TD method suffers,from two problems: the state space grows exponentially with the curse of dimensionality and there is a lack of methodology to analyse the convergence and sensitivity of TD algorithms. Measuring learning performance and analysing sensitivity of parameters are very difficult and expensive, and such performance metrics are obtained only by running an extensive set of experiments with different parameter values. In this paper, convergence is investigated by performance metrics, which is obtained through simulating a game of soccer. Sarsa(lambda) learning control algorithm, in conjunction with a linear function approximation technique known as the coding, is used to help soccer agents learn the optimal control processes. This paper proposes a methodology for finding the optimal parameter values to improve the quality of convergence.
引用
收藏
页码:913 / 922
页数:10
相关论文
共 50 条
  • [21] Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis
    Khamaru, Koulik
    Pananjady, Ashwin
    Ruan, Feng
    Wainwright, Martin J.
    Jordan, Michael, I
    SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE, 2021, 3 (04): : 1013 - 1040
  • [22] Chaotic dynamics and convergence analysis of temporal difference algorithms with bang-bang control
    Tutsoy, Onder
    Brown, Martin
    OPTIMAL CONTROL APPLICATIONS & METHODS, 2016, 37 (01): : 108 - 126
  • [23] Weak Convergence Properties of Constrained Emphatic Temporal-difference Learning with Constant and Slowly Diminishing Stepsize
    Yu, Huizhen
    JOURNAL OF MACHINE LEARNING RESEARCH, 2016, 17
  • [24] A Finite Time Analysis of Temporal Difference Learning with Linear Function Approximation
    Bhandari, Jalaj
    Russo, Daniel
    Singal, Raghav
    OPERATIONS RESEARCH, 2021, 69 (03) : 950 - 973
  • [25] On the worst-case analysis of temporal-difference learning algorithms
    Schapire, RE
    Warmuth, MK
    MACHINE LEARNING, 1996, 22 (1-3) : 95 - 121
  • [26] Generalized Emphatic Temporal Difference Learning: Bias-Variance Analysis
    Hallak, Assaf
    Tamar, Aviv
    Munos, Remi
    Mannor, Shie
    THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 1631 - 1637
  • [27] Hyperbolically Discounted Temporal Difference Learning
    Alexander, William H.
    Brown, Joshua W.
    NEURAL COMPUTATION, 2010, 22 (06) : 1511 - 1527
  • [28] On the Statistical Benefits of Temporal Difference Learning
    Cheikhi, David
    Russo, Daniel
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202
  • [29] Linear Observer Learning by Temporal Difference
    Menchetti, Stefano
    Zanon, Mario
    Bemporad, Alberto
    2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), 2022, : 2777 - 2782
  • [30] PRACTICAL ISSUES IN TEMPORAL DIFFERENCE LEARNING
    TESAURO, G
    MACHINE LEARNING, 1992, 8 (3-4) : 257 - 277