Adaptive play Q-Learning with initial heuristic approximation

被引:4
|
作者
Burkov, Andriy [1 ]
Chaib-draa, Brahim [1 ]
机构
[1] Univ Laval, Dept Informat & Genie Logiciel, Ste Foy, PQ G1K 7P4, Canada
关键词
D O I
10.1109/ROBOT.2007.363575
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The problem of an effective coordination of multiple autonomous robots is one of the most important tasks of the modern robotics. In turn, it is well known that the learning to coordinate multiple autonomous agents in a multiagent system is one of the most complex challenges of the state-of-the-art intelligent system design. Principally, this is because of the exponential growth of the environment's dimensionality with the number of learning agents. This challenge is known as "curse of dimensionality", and relates to the fact that the dimensionality of the multiagent coordination problem is exponential in the number of learning agents, because each state of the system is a joint state of all agents and each action is a joint action composed of actions of each agent. In this paper, we address this problem for the restricted class of environments known as goal-directed stochastic games with action-penalty representation. We use a single-agent problem solution as a heuristic approximation of the agents' initial preferences and, by so doing, we restrict to a great extent the space of multiagent learning. We show theoretically the correctness of such an initialization, and the results of experiments in a well-known two-robot grid world problem show that there is a significant reduction of complexity of the learning process.
引用
收藏
页码:1749 / +
页数:2
相关论文
共 50 条
  • [1] Adaptive Bases for Q-learning
    Di Castro, Dotan
    Mannor, Shie
    49TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2010, : 4587 - 4593
  • [2] Learning to Play Pac-Xon with Q-Learning and Two Double Q-Learning Variants
    Schilperoort, Jits
    Mak, Ivar
    Drugan, Madalina M.
    Wiering, Marco A.
    2018 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI), 2018, : 1151 - 1158
  • [3] Q-Learning with linear function approximation
    Melo, Francisco S.
    Ribeiro, M. Isabel
    LEARNING THEORY, PROCEEDINGS, 2007, 4539 : 308 - +
  • [4] An adaptive architecture for modular Q-learning
    Kohri, T
    Matsubayashi, K
    Tokoro, M
    IJCAI-97 - PROCEEDINGS OF THE FIFTEENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOLS 1 AND 2, 1997, : 820 - 825
  • [5] Fuzzy Q-Learning with an Adaptive Representation
    Waldock, A.
    Carse, B.
    2008 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-5, 2008, : 720 - +
  • [6] Adaptive moving average Q-learning
    Tan, Tao
    Xie, Hong
    Xia, Yunni
    Shi, Xiaoyu
    Shang, Mingsheng
    KNOWLEDGE AND INFORMATION SYSTEMS, 2024, 66 (12) : 7389 - 7417
  • [7] Gaussian approximation for bias reduction in Q-learning
    D’Eramo, Carlo
    Cini, Andrea
    Nuara, Alessandro
    Pirotta, Matteo
    Alippi, Cesare
    Peters, Jan
    Restelli, Marcello
    Journal of Machine Learning Research, 2021, 22
  • [8] ASYNCHRONOUS STOCHASTIC-APPROXIMATION AND Q-LEARNING
    TSITSIKLIS, JN
    MACHINE LEARNING, 1994, 16 (03) : 185 - 202
  • [9] Multiscale Q-learning with linear function approximation
    Bhatnagar, Shalabh
    Lakshmanan, K.
    DISCRETE EVENT DYNAMIC SYSTEMS-THEORY AND APPLICATIONS, 2016, 26 (03): : 477 - 509
  • [10] Zap Q-learning with Nonlinear Function Approximation
    Chen, Shuhang
    Devraj, Adithya M.
    Lu, Fan
    Busic, Ana
    Meyn, Sean P.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33