Research on actor-critic reinforcement learning in RoboCup

被引：0

作者：

Guo, He ^{[1
]}

Liu, Tianying ^{[1
]}

Wang, Yuxin ^{[1
]}

Chen, Feng ^{[1
]}

Fan, Jianming ^{[1
]}

机构：

[1] Dalian Univ Technol, Dept Comp Sci & Engn, Dalian 116024, Peoples R China

来源：

WCICA 2006: SIXTH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-12, CONFERENCE PROCEEDINGS | 2006年

关键词：

reinforcement learning; MAS; actor-critic; RoboCup; function approximation;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Actor-Critic method combines the fast convergence of value-based (Critic) and directivity on search of policy gradient (Actor). It is suitable for solving the problems with large state space. In this paper, the Actor Critic method with the tile-coding linear function approximation is analysed and applied to a RoboCup simulation subtask named "Soccer Keepaway". The experiments on Soccer Keepaway show that the policy learned by Actor-Critic method is better than policies from value-based Sarsa(lambda) and benchmarks.

引用

页码：205 / 205

页数：1

共 14 条

[1] Infinite-horizon policy-gradient estimation
Baxter, J
Bartlett, PL
[J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2001, 15 : 319 - 350
[2] Experiments with infinite-horizon, policy-gradient estimation
Baxter, J
Bartlett, PL
Weaver, L
[J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2001, 15 : 351 - 381
[3] KONDA VR, 1999, NEURAL INFORM PROCES
[4] Simulation-based optimization of Markov reward processes
Marbach, P
Tsitsiklis, JN
[J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2001, 46 (02) : 191 - 209
[5] RIEDMILLER M, 2000, J NEURAL COMPUTING A, V8, P323
[6] Reinforcement learning for RoboCup soccer keepaway
Stone, P
Sutton, RS
Kuhlmann, G
[J]. ADAPTIVE BEHAVIOR, 2005, 13 (03) : 165 - 188
[7] STONE P, 2001, ROBOCUP 2000 ROB SOC, V201, P249
[8] STONE P, 2001, P 5 INT C AUT AG NY
[9] STONE P, 2001, P 18 INT C MACH LEAR
[10] Sutton R. S., 1998, Reinforcement Learning: An Introduction, V22447

← 1 2 →