A modified Q-learning algorithm for multi-robot decision making

被引：0

作者：

Wang, Ying ^{[1
]}

de Silva, Clarence W. ^{[1
]}

机构：

[1] Univ British Columbia, Dept Mech Engn, Vancouver, BC V6T 1W5, Canada

来源：

PROCEEDINGS OF THE ASME INTERNATIONAL MECHANICAL ENGINERING CONGRESS AND EXPOSITION 2007, VOL 9, PTS A-C: MECHANICAL SYSTEMS AND CONTROL | 2008年

关键词：

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper presents a modified distributed Q-learning algorithm termed the Sequential Q-learning algorithm with Kalman Filtering (SQKF), for multi-robot decision making. While Q-learning is employed commonly in the multi-robot domain to support robot operation in dynamic and unknown environments, it also faces many challenges. It is questionable to scale the conventional single-agent Q-learning algorithm into the multi-robot domain because such an extension violates the Markov assumption on which the algorithm is based on. The empirical results show that it can confuse the robots and render them unable to learn a good cooperative policy due to incorrect credit assignment among robots and also make a robot incapable of observing the actions of other robots in the same environment. In this paper, a modified Q-learning algorithm termed the Sequential Q-learning Algorithm with Kalman Filtering (SQKF), which is suitable for multi-robot decision-making, is developed. The basic characteristics of the SQKF algorithm are: (1) the learning process is not parallel but sequential, i.e. the robots will not make decisions simultaneously and instead, they will learn and make decisions according to a predefined sequence; (2) a robot will not update its Q values with observed global rewards and instead, it will employ a specific Kalman filter to extract its real local reward from the global reward thereby updating its Q-table with this local reward. The new SQKF algorithm is intended to solve two problems in multi-robot Q-learning: Credit assignment and Behavior conflicts. The detailed procedure of the SQKF algorithm is presented and its application is illustrated. Empirical results show that the algorithm has better performance than the conventional single-agent Q-learning algorithm or the Team Q-learning algorithm in the multi-robot domain.

引用

页码：1275 / 1281

页数：7

共 50 条

[1] MULTI-ROBOT COOPERATIVE TRANSPORTATION OF OBJECTS USING MODIFIED Q-LEARNING
Siriwardana, Pallege Gamini Dilupa
de Silva, Clarence
[J]. PROCEEDINGS OF THE ASME INTERNATIONAL MECHANICAL ENGINEERING CONGRESS AND EXPOSITION - 2010, VOL 8, PTS A AND B, 2012, : 745 - 753
[2] A Modified Q-learning Multi Robot Path Planning Algorithm
Li, Bo
Liang, Hongbin
[J]. BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2020, 127 : 125 - 126
[3] Assess team Q-learning algorithm in a purely cooperative multi-robot task
Wang, Ying
De Silva, Clarence W.
[J]. PROCEEDINGS OF THE ASME INTERNATIONAL MECHANICAL ENGINERING CONGRESS AND EXPOSITION 2007, VOL 9, PTS A-C: MECHANICAL SYSTEMS AND CONTROL, 2008, : 627 - 633
[4] Multi-robot Cooperative Planning by Consensus Q-learning
Sadhu, Arup Kumar
Konar, Amit
Banerjee, Bonny
Nagar, Atulya K.
[J]. 2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 4158 - 4164
[5] Q-Learning Based Failure Detection and Self-Recovery Algorithm for Multi-Robot Domains
Kayir, Hatice Hilal Ezercan
[J]. ELEKTRONIKA IR ELEKTROTECHNIKA, 2019, 25 (01) : 3 - 7
[6] Q-Learning Based Failure Detection and Self-Recovery Algorithm for Multi-Robot Domains
Kayir, Hatice Hilal Ezercan
[J]. ELEKTRONIKA IR ELEKTROTECHNIKA, 2019, 25 (02) : 9 - 13
[7] The Optimization of Path Planning for Multi-robot System using Boltzmann Policy based Q-Learning Algorithm
Wang, Zeying
Shi, Zhiguo
Li, Yuankai
Tu, Jun
[J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND BIOMIMETICS (ROBIO), 2013, : 1199 - 1204
[8] Multi-robot box-pushing: Single-agent Q-learning vs. team Q-learning
Wang, Ying
de Silva, Clarence W.
[J]. 2006 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, VOLS 1-12, 2006, : 3694 - +
[9] Multi-robot Q-learning over Community Perception Network with Homogeneous Delays
Jin Lu
Yang Yuequan
Ni Chunbo
Cao Zhiqiang
Kong Yifei
[J]. COMPUTING, CONTROL AND INDUSTRIAL ENGINEERING IV, 2013, 823 : 321 - +
[10] Decentralized Function Approximated Q-Learning in Multi-Robot Systems For Predator Avoidance
Konda, Revanth
La, Hung Manh
Zhang, Jun
[J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2020, 5 (04): : 6342 - 6349

← 1 2 3 4 5 →