CONVERGENCE ANALYSIS ON TEMPORAL DIFFERENCE LEARNING

被引:0
|
作者
Leng, Jinsong [1 ]
Jain, Lakhmi [1 ]
Fyfe, Colin
机构
[1] Univ S Australia, Sch Elect & Informat Engn, Mawson Lakes, SA 5095, Australia
关键词
Temporal difference learning; Agent; Convergence analysis; APPROXIMATION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning to act in an. uncertain environment without external instruction is considered as one of the fundamental features of intelligence. Temporal difference (TD) learning is an incremental learning approach and has been widely used in various application domains. Utilising eligibility traces is an important mechanism in enhancing leaning ability. For large, stochastic and dynamic systems, however, the TD method suffers,from two problems: the state space grows exponentially with the curse of dimensionality and there is a lack of methodology to analyse the convergence and sensitivity of TD algorithms. Measuring learning performance and analysing sensitivity of parameters are very difficult and expensive, and such performance metrics are obtained only by running an extensive set of experiments with different parameter values. In this paper, convergence is investigated by performance metrics, which is obtained through simulating a game of soccer. Sarsa(lambda) learning control algorithm, in conjunction with a linear function approximation technique known as the coding, is used to help soccer agents learn the optimal control processes. This paper proposes a methodology for finding the optimal parameter values to improve the quality of convergence.
引用
收藏
页码:913 / 922
页数:10
相关论文
共 50 条
  • [41] Temporal Difference Learning as Gradient Splitting
    Liu, Rui
    Olshevsky, Alex
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [42] Temporal Difference Learning Waveform Selection
    Wang, Bin
    Wang, Jinkuan
    Song, Xin
    Han, Yinghua
    2009 ISECS INTERNATIONAL COLLOQUIUM ON COMPUTING, COMMUNICATION, CONTROL, AND MANAGEMENT, VOL IV, 2009, : 533 - +
  • [43] Investigating Learning Rates for Evolution and Temporal Difference Learning
    Lucas, Simon M.
    2008 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND GAMES, 2008, : 1 - 7
  • [44] A complementary learning systems approach to temporal difference learning
    Blakeman, Sam
    Mareschal, Denis
    NEURAL NETWORKS, 2020, 122 : 218 - 230
  • [45] On the convergence of finite difference methods for PDE under temporal refinement
    Love, E.
    Rider, W. J.
    COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2013, 66 (01) : 33 - 40
  • [46] A Theoretical Analysis of Temporal Difference Learning in the Iterated Prisoner’s Dilemma Game
    Naoki Masuda
    Hisashi Ohtsuki
    Bulletin of Mathematical Biology, 2009, 71 : 1818 - 1850
  • [47] A Theoretical Analysis of Temporal Difference Learning in the Iterated Prisoner's Dilemma Game
    Masuda, Naoki
    Ohtsuki, Hisashi
    BULLETIN OF MATHEMATICAL BIOLOGY, 2009, 71 (08) : 1818 - 1850
  • [48] True Online Temporal-Difference Learning
    van Seijen, Harm
    Mahmood, A. Rupam
    Pilarski, Patrick M.
    Machado, Marlos C.
    Sutton, Richard S.
    JOURNAL OF MACHINE LEARNING RESEARCH, 2016, 17
  • [49] Stochastic Temporal Difference Learning for Sequence Data
    Chien, Jen-Tzung
    Chiu, Yi-Chung
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [50] Temporal Difference Learning with Piecewise Linear Basis
    Chen Xingguo
    Gao Yang
    Fan Shunguo
    CHINESE JOURNAL OF ELECTRONICS, 2014, 23 (01) : 49 - 54