Weighted mean field reinforcement learning for large-scale UAV swarm confrontation

被引:17
|
作者
Wang, Baolai [1 ]
Li, Shengang [1 ]
Gao, Xianzhong [2 ]
Xie, Tao [1 ]
机构
[1] Natl Univ Def Technol, Coll Comp, Changsha 410073, Hunan, Peoples R China
[2] Natl Univ Def Technol, Coll Aerosp Sci & Engn, Changsha 410073, Hunan, Peoples R China
关键词
Multi-agent reinforcement learning; Unmanned aerial vehicle; Swarm confrontation; Attention mechanism; LEVEL;
D O I
10.1007/s10489-022-03840-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Finding the optimal game strategy is a difficult problem in unmanned aerial vehicle (UAV) swarm confrontation. As an effective solution to the sequential decision-making problem, multi-agent reinforcement learning (MARL) provides a promising way to realize intelligent countermeasures. However, there are two challenges in applying MARL to large-scale UAV swarm confrontation: i) the curse of dimensionality caused by the excessive scale of UAV clusters and ii) the generalization problem caused by the dynamically changing UAV cluster size. To address these problems, we propose a novel MARL paradigm, called Weighted Mean Field Reinforcement Learning, where the pairwise communication between any UAV and its neighbors is modeled as that between a central UAV and the virtual UAV, which is abstracted from the weighted mean effect of neighboring UAVs. This approach reduces the multi-agent problem to a two-agent problem, which can reduce the input dimension of the agent and adapt to the changing cluster size. The communication content between UAVs includes actions and local observations. Actions can enhance the cooperation between UAVs and alleviate the non-stationarity of the environment, while local observations can expand the perception range of the central UAV so that it can obtain more useful information about the environment. The attention mechanism is leveraged to enable UAVs to select more valuable information flexibly, making our method more scalable than other algorithms. Combining this paradigm with double Q-learning and actor-critic algorithms, we propose weighted mean field Q-learning (WMFQ) and weighted mean field actor-critic (WMFAC) algorithms. Experiments on our constructed UAV swarm confrontation environment verify the effectiveness and scalability of our algorithms.
引用
收藏
页码:5274 / 5289
页数:16
相关论文
共 50 条
  • [31] Deep Reinforcement Learning for Large-Scale Epidemic Control
    Libin, Pieter J. K.
    Moonens, Arno
    Verstraeten, Timothy
    Perez-Sanjines, Fabian
    Hens, Niel
    Lemey, Philippe
    Nowe, Ann
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: APPLIED DATA SCIENCE AND DEMO TRACK, ECML PKDD 2020, PT V, 2021, 12461 : 155 - 170
  • [32] Mean Field Modeling of Large-Scale Energy Systems
    Gentile, Basilio
    Granunatico, Sergio
    Lygeros, John
    IFAC PAPERSONLINE, 2015, 48 (01): : 918 - +
  • [33] Random Label Based Security Authentication Mechanism for Large-scale UAV Swarm
    Liu, Liangjun
    Qian, Hongyan
    Hu, Feng
    2019 IEEE INTL CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, BIG DATA & CLOUD COMPUTING, SUSTAINABLE COMPUTING & COMMUNICATIONS, SOCIAL COMPUTING & NETWORKING (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2019), 2019, : 229 - 235
  • [34] Decentralized Multi-agent Reinforcement Learning for Large-scale Mobile Wireless Sensor Network Control Using Mean Field Games
    Zhou, Zejian
    Qian, Lijun
    Xu, Hao
    2024 33RD INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATIONS AND NETWORKS, ICCCN 2024, 2024,
  • [35] Event-Triggered Optimal Formation Tracking Control Using Reinforcement Learning for Large-Scale UAV Systems
    Yan, Ziwei
    Han, Liang
    Li, Xiaoduo
    Li, Jinjie
    Ren, Zhang
    2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA, 2023, : 3233 - 3239
  • [36] Evolutionary Multi-Objective Deep Reinforcement Learning for Autonomous UAV Navigation in Large-Scale Complex Environments
    An, Guangyan
    Wu, Ziyu
    Shen, Zhilong
    Shang, Ke
    Ishibuchi, Hisao
    PROCEEDINGS OF THE 2023 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, GECCO 2023, 2023, : 633 - 641
  • [37] C-SPPO: A deep reinforcement learning framework for large-scale dynamic logistics UAV routing problem
    Wang, Fei
    Zhang, Honghai
    Du, Sen
    Hua, Mingzhuang
    Zhong, Gang
    CHINESE JOURNAL OF AERONAUTICS, 2025, 38 (05):
  • [38] A sinusoidal social learning swarm optimizer for large-scale optimization
    Liu, Nengxian
    Pan, Jeng-Shyang
    Chu, Shu-Chuan
    Hu, Pei
    KNOWLEDGE-BASED SYSTEMS, 2023, 259
  • [39] Blocks Assemble! Learning to Assemble with Large-Scale Structured Reinforcement Learning
    Ghasemipour, Seyed Kamyar Seyed
    Freeman, Daniel
    David, Byron
    Gu, Shixiang Shane
    Kataoka, Satoshi
    Mordatch, Igor
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [40] Graph-based multi-agent reinforcement learning for large-scale UAVs swarm system control
    Zhao, Bocheng
    Huo, Mingying
    Li, Zheng
    Yu, Ze
    Qi, Naiming
    AEROSPACE SCIENCE AND TECHNOLOGY, 2024, 150