Adaptive play Q-Learning with initial heuristic approximation

被引：4

作者：

Burkov, Andriy ^{[1
]}

Chaib-draa, Brahim ^{[1
]}

机构：

[1] Univ Laval, Dept Informat & Genie Logiciel, Ste Foy, PQ G1K 7P4, Canada

来源：

PROCEEDINGS OF THE 2007 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, VOLS 1-10 | 2007年

关键词：

D O I：

10.1109/ROBOT.2007.363575

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The problem of an effective coordination of multiple autonomous robots is one of the most important tasks of the modern robotics. In turn, it is well known that the learning to coordinate multiple autonomous agents in a multiagent system is one of the most complex challenges of the state-of-the-art intelligent system design. Principally, this is because of the exponential growth of the environment's dimensionality with the number of learning agents. This challenge is known as "curse of dimensionality", and relates to the fact that the dimensionality of the multiagent coordination problem is exponential in the number of learning agents, because each state of the system is a joint state of all agents and each action is a joint action composed of actions of each agent. In this paper, we address this problem for the restricted class of environments known as goal-directed stochastic games with action-penalty representation. We use a single-agent problem solution as a heuristic approximation of the agents' initial preferences and, by so doing, we restrict to a great extent the space of multiagent learning. We show theoretically the correctness of such an initialization, and the results of experiments in a well-known two-robot grid world problem show that there is a significant reduction of complexity of the learning process.

引用

页码：1749 / +

页数：2

共 50 条

[1] Adaptive Bases for Q-learning
Di Castro, Dotan
Mannor, Shie
49TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2010, : 4587 - 4593
[2] Learning to Play Pac-Xon with Q-Learning and Two Double Q-Learning Variants
Schilperoort, Jits
Mak, Ivar
Drugan, Madalina M.
Wiering, Marco A.
2018 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI), 2018, : 1151 - 1158
[3] Q-Learning with linear function approximation
Melo, Francisco S.
Ribeiro, M. Isabel
LEARNING THEORY, PROCEEDINGS, 2007, 4539 : 308 - +
[4] An adaptive architecture for modular Q-learning
Kohri, T
Matsubayashi, K
Tokoro, M
IJCAI-97 - PROCEEDINGS OF THE FIFTEENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOLS 1 AND 2, 1997, : 820 - 825
[5] Fuzzy Q-Learning with an Adaptive Representation
Waldock, A.
Carse, B.
2008 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-5, 2008, : 720 - +
[6] Adaptive moving average Q-learning
Tan, Tao
Xie, Hong
Xia, Yunni
Shi, Xiaoyu
Shang, Mingsheng
KNOWLEDGE AND INFORMATION SYSTEMS, 2024, 66 (12) : 7389 - 7417
[7] Gaussian approximation for bias reduction in Q-learning
D’Eramo, Carlo
Cini, Andrea
Nuara, Alessandro
Pirotta, Matteo
Alippi, Cesare
Peters, Jan
Restelli, Marcello
Journal of Machine Learning Research, 2021, 22
[8] ASYNCHRONOUS STOCHASTIC-APPROXIMATION AND Q-LEARNING
TSITSIKLIS, JN
MACHINE LEARNING, 1994, 16 (03) : 185 - 202
[9] Multiscale Q-learning with linear function approximation
Bhatnagar, Shalabh
Lakshmanan, K.
DISCRETE EVENT DYNAMIC SYSTEMS-THEORY AND APPLICATIONS, 2016, 26 (03): : 477 - 509
[10] Zap Q-learning with Nonlinear Function Approximation
Chen, Shuhang
Devraj, Adithya M.
Lu, Fan
Busic, Ana
Meyn, Sean P.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33

← 1 2 3 4 5 →