A Deep Reinforcement Learning Approach Using Asymmetric Self-Play for Robust Multirobot Flocking

被引:0
|
作者
Jia, Yunjie [1 ]
Song, Yong [1 ]
Cheng, Jiyu [2 ]
Jin, Jiong [3 ]
Zhang, Wei [2 ]
Yang, Simon X. [4 ]
Kwong, Sam [5 ]
机构
[1] Shandong Univ, Sch Mech Elect & Informat Engn, Shandong Key Lab Intelligent Elect Packaging Testi, Weihai 264209, Peoples R China
[2] Shandong Univ, Sch Control Sci & Engn, Jinan 250061, Peoples R China
[3] Swinburne Univ Technol, Sch Sci Comp & Engn Technol, Hawthorn, VIC 3122, Australia
[4] Univ Guelph, Adv Robot & Intelligent Syst Lab, Guelph, ON N1G 2W1, Canada
[5] Lingnan Univ, Sch Data Sci, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
Robots; Adaptation models; Training; Collision avoidance; Navigation; Multi-robot systems; Uncertainty; Robot sensing systems; Robustness; Vehicle dynamics; Adversarial training; flocking; multiagent deep reinforcement learning (MADRL); autonomous vehicles; NONLINEAR MULTIAGENT SYSTEMS; OUTPUT REGULATION; ENHANCEMENT; UAVS;
D O I
10.1109/TII.2024.3523576
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Flocking control, as an essential approach for survivable navigation of multirobot systems, has been widely applied in fields, such as logistics, service delivery, and search and rescue. However, realistic environments are typically complex, dynamic, and even aggressive, posing considerable threats to the safety of flocking robots. In this article, based on deep reinforcement learning, an Asymmetric Self-play-empowered Flocking Control framework is proposed to address this concern. Specifically, the flocking robots are trained concurrently with learnable adversarial interferers to stimulate the intelligence of the flocking strategy. A two-stage self-play training paradigm is developed to improve the robustness and generalization of the model. Furthermore, an auxiliary training module regarding the learning of transition dynamics is designed, dramatically enhancing the adaptability to environmental uncertainties. Feature-level and agent-level attention are implemented for action and value generation, respectively. Both extensive comparative experiments and real-world deployment demonstrate the superiority and practicality of the proposed framework.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Self-play, deep search and diminishing returns - Ken Thompson
    Heinz, EA
    ICGA JOURNAL, 2001, 24 (02) : 75 - 79
  • [42] Using Self-Play within Deep Q Learning to Improve Real-Time Production Scheduling Completed Research
    Groth, Michael
    Freier, Pascal
    Schumann, Matthias
    DIGITAL INNOVATION AND ENTREPRENEURSHIP (AMCIS 2021), 2021,
  • [43] Anytime Self-play Learning to Satisfy Functional Optimality Criteria
    Burkov, Andriy
    Chaib-draa, Brahim
    ALGORITHMIC DECISION THEORY, PROCEEDINGS, 2009, 5783 : 446 - 457
  • [44] Maneuvering strategy generation algorithm for multi-UAV in close-range air combat based on deep reinforcement learning and self-play
    Kong W.-R.
    Zhou D.-Y.
    Zhao Y.-Y.
    Yang W.-S.
    Kongzhi Lilun Yu Yingyong/Control Theory and Applications, 2022, 39 (02): : 352 - 362
  • [45] Mastering the Card Game of Jaipur Through Zero-Knowledge Self-Play Reinforcement Learning and Action Masks
    Department of Artificial Intelligence, Faculty of ICT, University of Malta, Msida, Malta
    Lect. Notes Comput. Sci., (231-244):
  • [46] Fixed-Wing UAVs flocking in continuous spaces: A deep reinforcement learning approach
    Yan, Chao
    Xiang, Xiaojia
    Wang, Chang
    ROBOTICS AND AUTONOMOUS SYSTEMS, 2020, 131
  • [47] Follow-up on "Self-play, deep search, and diminishing returns"
    Heinz, EA
    ICGA JOURNAL, 2003, 26 (02) : 75 - 80
  • [48] A Self-Play Policy Optimization Approach to Battling Pok ′emon
    Huang, Dan
    Lee, Scott
    2019 IEEE CONFERENCE ON GAMES (COG), 2019,
  • [49] Robust biped locomotion using deep reinforcement learning on top of an analytical control approach
    Kasaei, Mohammadreza
    Abreu, Miguel
    Lau, Nuno
    Pereira, Artur
    Reis, Luis Paulo
    ROBOTICS AND AUTONOMOUS SYSTEMS, 2021, 146
  • [50] Manipulating the Distributions of Experience used for Self-Play Learning in Expert Iteration
    Soemers, Dennis J. N. J.
    Piette, Eric
    Stephenson, Matthew
    Browne, Cameron
    2020 IEEE CONFERENCE ON GAMES (IEEE COG 2020), 2020, : 245 - 252