CONVERGENCE ANALYSIS ON TEMPORAL DIFFERENCE LEARNING

被引:0
|
作者
Leng, Jinsong [1 ]
Jain, Lakhmi [1 ]
Fyfe, Colin
机构
[1] Univ S Australia, Sch Elect & Informat Engn, Mawson Lakes, SA 5095, Australia
关键词
Temporal difference learning; Agent; Convergence analysis; APPROXIMATION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning to act in an. uncertain environment without external instruction is considered as one of the fundamental features of intelligence. Temporal difference (TD) learning is an incremental learning approach and has been widely used in various application domains. Utilising eligibility traces is an important mechanism in enhancing leaning ability. For large, stochastic and dynamic systems, however, the TD method suffers,from two problems: the state space grows exponentially with the curse of dimensionality and there is a lack of methodology to analyse the convergence and sensitivity of TD algorithms. Measuring learning performance and analysing sensitivity of parameters are very difficult and expensive, and such performance metrics are obtained only by running an extensive set of experiments with different parameter values. In this paper, convergence is investigated by performance metrics, which is obtained through simulating a game of soccer. Sarsa(lambda) learning control algorithm, in conjunction with a linear function approximation technique known as the coding, is used to help soccer agents learn the optimal control processes. This paper proposes a methodology for finding the optimal parameter values to improve the quality of convergence.
引用
收藏
页码:913 / 922
页数:10
相关论文
共 50 条
  • [31] Schizophrenia, dopamine and temporal difference learning
    Thurnham, AJ
    Done, DJ
    Davey, N
    Frank, RJ
    Doughty, OJ
    SCHIZOPHRENIA RESEARCH, 2006, 81 : 120 - 120
  • [32] Temporal difference learning in network routing
    Broadbent, R
    Deccio, CT
    Clement, M
    CIC '04: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMMUNICATIONS IN COMPUTING, 2004, : 14 - 20
  • [33] Optimistic Temporal Difference Learning for 2048
    Guei, Hung
    Chen, Lung-Pin
    Wu, I-Chen
    IEEE TRANSACTIONS ON GAMES, 2022, 14 (03) : 478 - 487
  • [34] Prospective and retrospective temporal difference learning
    Dayan, Peter
    NETWORK-COMPUTATION IN NEURAL SYSTEMS, 2009, 20 (01) : 32 - 46
  • [35] A temporal difference account of avoidance learning
    Moutoussis, Michael
    Bentall, Richard P.
    Williams, Jonathan
    Dayan, Peter
    NETWORK-COMPUTATION IN NEURAL SYSTEMS, 2008, 19 (02) : 137 - 160
  • [36] Interference and Generalization in Temporal Difference Learning
    Bengio, Emmanuel
    Pineau, Joelle
    Precup, Doina
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [37] Temporal difference coding in reinforcement learning
    Iwata, K
    Ikeda, K
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING, 2003, 2690 : 218 - 227
  • [38] Accelerated Gradient Temporal Difference Learning
    Pan, Yangchen
    White, Adam
    White, Martha
    THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 2464 - 2470
  • [39] Temporal Difference Learning Waveform Selection
    Wang, Bin
    Wang, Jinkuan
    Song, Xin
    Han, Yinghua
    JOURNAL OF COMPUTERS, 2010, 5 (09) : 1394 - 1401
  • [40] Source Traces for Temporal Difference Learning
    Pitis, Silviu
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 3952 - 3959