Research on actor-critic reinforcement learning in RoboCup

被引:0
|
作者
Guo, He [1 ]
Liu, Tianying [1 ]
Wang, Yuxin [1 ]
Chen, Feng [1 ]
Fan, Jianming [1 ]
机构
[1] Dalian Univ Technol, Dept Comp Sci & Engn, Dalian 116024, Peoples R China
关键词
reinforcement learning; MAS; actor-critic; RoboCup; function approximation;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Actor-Critic method combines the fast convergence of value-based (Critic) and directivity on search of policy gradient (Actor). It is suitable for solving the problems with large state space. In this paper, the Actor Critic method with the tile-coding linear function approximation is analysed and applied to a RoboCup simulation subtask named "Soccer Keepaway". The experiments on Soccer Keepaway show that the policy learned by Actor-Critic method is better than policies from value-based Sarsa(lambda) and benchmarks.
引用
收藏
页码:205 / 205
页数:1
相关论文
共 14 条
  • [1] Infinite-horizon policy-gradient estimation
    Baxter, J
    Bartlett, PL
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2001, 15 : 319 - 350
  • [2] Experiments with infinite-horizon, policy-gradient estimation
    Baxter, J
    Bartlett, PL
    Weaver, L
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2001, 15 : 351 - 381
  • [3] KONDA VR, 1999, NEURAL INFORM PROCES
  • [4] Simulation-based optimization of Markov reward processes
    Marbach, P
    Tsitsiklis, JN
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2001, 46 (02) : 191 - 209
  • [5] RIEDMILLER M, 2000, J NEURAL COMPUTING A, V8, P323
  • [6] Reinforcement learning for RoboCup soccer keepaway
    Stone, P
    Sutton, RS
    Kuhlmann, G
    [J]. ADAPTIVE BEHAVIOR, 2005, 13 (03) : 165 - 188
  • [7] STONE P, 2001, ROBOCUP 2000 ROB SOC, V201, P249
  • [8] STONE P, 2001, P 5 INT C AUT AG NY
  • [9] STONE P, 2001, P 18 INT C MACH LEAR
  • [10] Sutton R. S., 1998, Reinforcement Learning: An Introduction, V22447