Asymmetric Self-Play-Enabled Intelligent Heterogeneous Multirobot Catching System Using Deep Multiagent Reinforcement Learning

被引：12

作者：

Gao, Yuan ^{[1
]}

Chen, Junfeng ^{[1
]}

Chen, Xi ^{[2
]}

Wang, Chongyang ^{[3
]}

Hu, Junjie ^{[1
]}

Deng, Fuqin ^{[1
]}

Lam, Tin Lun ^{[1
,4
]}

机构：

[1] Chinese Univ Hong Kong, Shenzhen Inst Artificial Intelligence & Robot Soc, Shenzhen 518100, Peoples R China

[2] Tsinghua Univ, Dept Automat, Beijing 100084, Peoples R China

[3] UCL, London WC1E6BT, England

[4] Chinese Univ Hong Kong, Sch Sci & Engn, Shenzhen 518100, Peoples R China

来源：

IEEE TRANSACTIONS ON ROBOTICS | 2023年 / 39卷 / 04期

基金：

国家重点研发计划;

关键词：

Robots; Task analysis; Training; Planning; Multi-robot systems; Mathematical models; Heuristic algorithms; Asymmetric self-play; catching systems; heterogeneous multirobot system (HMRS); reinforcement learning (RL); DECENTRALIZED CONTROL; CONTROL POLICIES; NEURAL-NETWORKS; TRACKING; ROBOTS; GAME; VEHICLES; SEARCH; GO;

D O I：

10.1109/TRO.2023.3257541

中图分类号：

TP24 [机器人技术];

学科分类号：

080202 ; 1405 ;

摘要：

Aiming to develop a more robust and intelligent heterogeneous system for adversarial catching in security and rescue tasks, in this article, we discuss the specialities of applying asymmetric self-play and curriculum learning techniques to deal with the increasing heterogeneity and number of different robots in modern heterogeneous multirobot systems (HMRS). Our method, based on actor-critic multiagent reinforcement learning, provides a framework that can enable cooperative behaviors among heterogeneous multirobot teams. This leads to the development of an HMRS for complex catching scenarios that involve several robot teams and real-world constraints. We conduct simulated experiments to evaluate different mechanisms' influence on our method's performance, and real-world experiments to assess our system's performance in complex real-world catching problems. In addition, a bridging study is conducted to compare our method with a state-of-the-art method called S2M2 in heterogeneous catching problems, and our method performs better in adversarial settings. As a result, we show that the proposed framework, through fusing asymmetric self-play and curriculum learning during training, is able to successfully complete the HMRS catching task under realistic constraints in both simulation and the real world, thus providing a direction for future large-scale intelligent security & rescue HMRS.

引用

页码：2603 / 2622

页数：20