An Algorithm for UAV Pursuit-Evasion Game Based on MADDPG and Contrastive Learning

被引:0
|
作者
Wang R. [1 ]
Wang X. [1 ]
机构
[1] School of Astronautics, Beijing Institute of Technology, Beijing
来源
Yuhang Xuebao/Journal of Astronautics | 2024年 / 45卷 / 02期
关键词
Deep contrastive learning; Multi-agent; Nash equilibrium; Pursuit-evasion game; Reinforcement learning; Unmanned aerial vehicle(UAV);
D O I
10.3873/j.issn.1000-1328.2024.02.011
中图分类号
学科分类号
摘要
To solve the pursuit and evasion game problem of unmanned aerial vehicles in complex combat environments, a Markov model is established, and reward functions for both pursuer and evader are designed under the zero-sum game concept. A centralized training with distributed execution framework is constructed for multi-agent deep deterministic policy gradient (MADDPG) to solve the Nash equilibrium of the pursuit-evasion game. To address the difficult issue of analytically representing the high-dimensional capture(escape)regions characterized by initial positions of the pursuers and evaders, a deep contrastive learning algorithm based on the MADDPG network is built to indirectly represent the high-dimensional capture (escape) regions through the construction and training of Siamese Network. Simulation results show that the Nash equilibrium solution of the pursuit-evasion game of UAVs under given conditions can be gotten by the MADDPG algorithm, and the accuracy rate of representing high-dimensional capture(escape)regions achieves 95% by the combination of contrastive learning algorithm and the converged MADDPG network. © 2024 Chinese Society of Astronautics. All rights reserved.
引用
收藏
页码:262 / 272
页数:10
相关论文
共 21 条
  • [1] ZHOU Xinmin, WU Jiahui, JIA Shengde, Et al., Progress in research on combat decision-making technology in UAVs[J], National Defense Technology, 42, 3, pp. 33-41, (2021)
  • [2] CONWAY B A, PONTANI M., Numerical solution of the three-dimensional orbital pursuit-evasion game[J], Journal of Guidance, Control, and Dynamics, 32, 2, pp. 474-487, (2009)
  • [3] CHANG Yan, CHEN Yun, XIAN Yong, Et al., Differential game guidance for space rendezvous of maneuvering target [J], Journal of Astronautics, 37, 7, pp. 795-801, (2016)
  • [4] ZENG X, YANG L P, ZHU Y W, Et al., Comparison of two optimal guidance methods for the long-distance orbital pursuit-evasion game [J], IEEE Transactions on Aerospace and Electronic Systems, 57, 1, pp. 521-539, (2021)
  • [5] WU S., Linear-quadratic non-zero sum backward stochastic differential game with overlapping information [J], IEEE Transactions on Automatic Control, 68, 3, pp. 1800-1806, (2023)
  • [6] WANG G C, YU Z Y., A Pontryagin’s maximum principle for non-zero sum differential games of BSDEs with applications[J], IEEE Transactions on Automatic Control, 55, 7, pp. 1742-1747, (2010)
  • [7] ZHANG W, SONG K, RONG X W, Et al., Coarse-to-fine UAV target tracking with deep reinforcement learning[J], IEEE Transactions on Automation Science and Engineering, 16, 4, pp. 1522-1530, (2019)
  • [8] ZENG X, ZHU Y W, YANG L P, Et al., A guidance method for coplanar orbital interception based on reinforcement learning [J], Journal of Systems Engineering and Electronics, 32, 4, pp. 927-938, (2021)
  • [9] PEI Pei, HE Shaoming, WANG Jiang, Et al., Integrated guidance and control for missile using deep reinforcement learning[J], Journal of Astronautics, 42, 10, pp. 1293-1304, (2021)
  • [10] PENG C, ZHANG H W, HE Y X, Et al., State-following-kernel-based online reinforcement learning guidance law against maneuvering target[J], IEEE Transactions on Aerospace and Electronic Systems, 58, 6, pp. 5784-5797, (2022)