Safe multi-agent reinforcement learning for multi-robot control

被引:18
|
作者
Gu, Shangding [1 ,4 ]
Kuba, Jakub Grudzien [2 ]
Chen, Yuanpei [4 ]
Du, Yali [3 ]
Yang, Long [4 ]
Knoll, Alois [1 ]
Yang, Yaodong [4 ]
机构
[1] Tech Univ Munich, Dept Comp Sci, Munich, Germany
[2] Univ Oxford, Dept Stat, Oxford, England
[3] Kings Coll London, Dept Informat, London, England
[4] Peking Univ, Inst Artificial Intelligence, Beijing, Peoples R China
关键词
Constrained Markov game; Constrained policy optimisation; Safe multi-agent benchmarks; Safe multi-robot control; NETWORKS;
D O I
10.1016/j.artint.2023.103905
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A challenging problem in robotics is how to control multiple robots cooperatively and safely in real-world applications. Yet, developing multi-robot control methods from the perspective of safe multi-agent reinforcement learning (MARL) has merely been studied. To fill this gap, in this study, we investigate safe MARL for multi-robot control on cooperative tasks, in which each individual robot has to not only meet its own safety constraints while maximising their reward, but also consider those of others to guarantee safe team behaviours. Firstly, we formulate the safe MARL problem as a constrained Markov game and employ policy optimisation to solve it theoretically. The proposed algorithm guarantees monotonic improvement in reward and satisfaction of safety constraints at every iteration. Secondly, as approximations to the theoretical solution, we propose two safe multi -agent policy gradient methods: Multi-Agent Constrained Policy Optimisation (MACPO) and MAPPO-Lagrangian. Thirdly, we develop the first three safe MARL benchmarks-Safe Multi -Agent MuJoCo (Safe MAMuJoCo), Safe Multi-Agent Robosuite (Safe MARobosuite) and Safe Multi-Agent Isaac Gym (Safe MAIG) to expand the toolkit of MARL and robot control research communities. Finally, experimental results on the three safe MARL benchmarks indicate that our methods can achieve state-of-the-art performance in the balance between improving reward and satisfying safety constraints compared with strong baselines. Demos and code are available at the link (https://sites .google .com /view /aij -safe -marl/).2Crown Copyright (c) 2023 Published by Elsevier B.V. All rights reserved.
引用
收藏
页数:24
相关论文
共 50 条
  • [11] A multi-agent reinforcement learning approach to robot soccer
    Yong Duan
    Bao Xia Cui
    Xin He Xu
    Artificial Intelligence Review, 2012, 38 : 193 - 211
  • [12] A multi-agent reinforcement learning approach to robot soccer
    Duan, Yong
    Cui, Bao Xia
    Xu, Xin He
    ARTIFICIAL INTELLIGENCE REVIEW, 2012, 38 (03) : 193 - 211
  • [13] Multi-agent reinforcement learning for character control
    Li, Cheng
    Fussell, Levi
    Komura, Taku
    VISUAL COMPUTER, 2021, 37 (12): : 3115 - 3123
  • [14] Multi-agent reinforcement learning for character control
    Cheng Li
    Levi Fussell
    Taku Komura
    The Visual Computer, 2021, 37 : 3115 - 3123
  • [15] A multi-agent system for multi-robot mapping and exploration
    Konolige, K
    Guzzoni, D
    Nicewarner, K
    MULTI-ROBOT SYSTEMS: FROM SWARMS TO INTELLIGENT AUTOMATA, 2002, : 11 - 19
  • [16] Multi-robot exploration using multi-agent approach
    Kulich, Miroslav
    Rollo, Milan
    Mazl, Roman
    Chudoba, Jan
    Benda, Petr
    Preucil, Libor
    Pechoucek, Michal
    PROCEEDINGS OF THE 13TH IASTED INTERNATIONAL CONFERENCE ON ROBOTICS AND APPLICATIONS/PROCEEDINGS OF THE IASTED INTERNATIONAL CONFERENCE ON TELEMATICS, 2007, : 495 - +
  • [17] Multi-agent reinforcement learning for redundant robot control in task-space
    Perrusquia, Adolfo
    Yu, Wen
    Li, Xiaoou
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2021, 12 (01) : 231 - 241
  • [18] Multi-agent reinforcement learning for redundant robot control in task-space
    Adolfo Perrusquía
    Wen Yu
    Xiaoou Li
    International Journal of Machine Learning and Cybernetics, 2021, 12 : 231 - 241
  • [19] Reinforcement learning in the multi-robot domain
    Mataric, MJ
    AUTONOMOUS ROBOTS, 1997, 4 (01) : 73 - 83
  • [20] Reinforcement Learning in the Multi-Robot Domain
    Maja J. Matarić
    Autonomous Robots, 1997, 4 : 73 - 83