A modified Q-learning algorithm for multi-robot decision making

被引:0
|
作者
Wang, Ying [1 ]
de Silva, Clarence W. [1 ]
机构
[1] Univ British Columbia, Dept Mech Engn, Vancouver, BC V6T 1W5, Canada
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a modified distributed Q-learning algorithm termed the Sequential Q-learning algorithm with Kalman Filtering (SQKF), for multi-robot decision making. While Q-learning is employed commonly in the multi-robot domain to support robot operation in dynamic and unknown environments, it also faces many challenges. It is questionable to scale the conventional single-agent Q-learning algorithm into the multi-robot domain because such an extension violates the Markov assumption on which the algorithm is based on. The empirical results show that it can confuse the robots and render them unable to learn a good cooperative policy due to incorrect credit assignment among robots and also make a robot incapable of observing the actions of other robots in the same environment. In this paper, a modified Q-learning algorithm termed the Sequential Q-learning Algorithm with Kalman Filtering (SQKF), which is suitable for multi-robot decision-making, is developed. The basic characteristics of the SQKF algorithm are: (1) the learning process is not parallel but sequential, i.e. the robots will not make decisions simultaneously and instead, they will learn and make decisions according to a predefined sequence; (2) a robot will not update its Q values with observed global rewards and instead, it will employ a specific Kalman filter to extract its real local reward from the global reward thereby updating its Q-table with this local reward. The new SQKF algorithm is intended to solve two problems in multi-robot Q-learning: Credit assignment and Behavior conflicts. The detailed procedure of the SQKF algorithm is presented and its application is illustrated. Empirical results show that the algorithm has better performance than the conventional single-agent Q-learning algorithm or the Team Q-learning algorithm in the multi-robot domain.
引用
收藏
页码:1275 / 1281
页数:7
相关论文
共 50 条
  • [1] MULTI-ROBOT COOPERATIVE TRANSPORTATION OF OBJECTS USING MODIFIED Q-LEARNING
    Siriwardana, Pallege Gamini Dilupa
    de Silva, Clarence
    [J]. PROCEEDINGS OF THE ASME INTERNATIONAL MECHANICAL ENGINEERING CONGRESS AND EXPOSITION - 2010, VOL 8, PTS A AND B, 2012, : 745 - 753
  • [2] A Modified Q-learning Multi Robot Path Planning Algorithm
    Li, Bo
    Liang, Hongbin
    [J]. BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2020, 127 : 125 - 126
  • [3] Assess team Q-learning algorithm in a purely cooperative multi-robot task
    Wang, Ying
    De Silva, Clarence W.
    [J]. PROCEEDINGS OF THE ASME INTERNATIONAL MECHANICAL ENGINERING CONGRESS AND EXPOSITION 2007, VOL 9, PTS A-C: MECHANICAL SYSTEMS AND CONTROL, 2008, : 627 - 633
  • [4] Multi-robot Cooperative Planning by Consensus Q-learning
    Sadhu, Arup Kumar
    Konar, Amit
    Banerjee, Bonny
    Nagar, Atulya K.
    [J]. 2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 4158 - 4164
  • [5] Q-Learning Based Failure Detection and Self-Recovery Algorithm for Multi-Robot Domains
    Kayir, Hatice Hilal Ezercan
    [J]. ELEKTRONIKA IR ELEKTROTECHNIKA, 2019, 25 (01) : 3 - 7
  • [6] Q-Learning Based Failure Detection and Self-Recovery Algorithm for Multi-Robot Domains
    Kayir, Hatice Hilal Ezercan
    [J]. ELEKTRONIKA IR ELEKTROTECHNIKA, 2019, 25 (02) : 9 - 13
  • [7] The Optimization of Path Planning for Multi-robot System using Boltzmann Policy based Q-Learning Algorithm
    Wang, Zeying
    Shi, Zhiguo
    Li, Yuankai
    Tu, Jun
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND BIOMIMETICS (ROBIO), 2013, : 1199 - 1204
  • [8] Multi-robot box-pushing: Single-agent Q-learning vs. team Q-learning
    Wang, Ying
    de Silva, Clarence W.
    [J]. 2006 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, VOLS 1-12, 2006, : 3694 - +
  • [9] Multi-robot Q-learning over Community Perception Network with Homogeneous Delays
    Jin Lu
    Yang Yuequan
    Ni Chunbo
    Cao Zhiqiang
    Kong Yifei
    [J]. COMPUTING, CONTROL AND INDUSTRIAL ENGINEERING IV, 2013, 823 : 321 - +
  • [10] Decentralized Function Approximated Q-Learning in Multi-Robot Systems For Predator Avoidance
    Konda, Revanth
    La, Hung Manh
    Zhang, Jun
    [J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2020, 5 (04): : 6342 - 6349