An Algorithm for UAV Pursuit-Evasion Game Based on MADDPG and Contrastive Learning

被引：0

作者：

Wang R. ^{[1
]}

Wang X. ^{[1
]}

机构：

[1] School of Astronautics, Beijing Institute of Technology, Beijing

来源：

Yuhang Xuebao/Journal of Astronautics | 2024年 / 45卷 / 02期

关键词：

Deep contrastive learning; Multi-agent; Nash equilibrium; Pursuit-evasion game; Reinforcement learning; Unmanned aerial vehicle(UAV);

D O I：

10.3873/j.issn.1000-1328.2024.02.011

中图分类号：

学科分类号：

摘要：

To solve the pursuit and evasion game problem of unmanned aerial vehicles in complex combat environments, a Markov model is established, and reward functions for both pursuer and evader are designed under the zero-sum game concept. A centralized training with distributed execution framework is constructed for multi-agent deep deterministic policy gradient (MADDPG) to solve the Nash equilibrium of the pursuit-evasion game. To address the difficult issue of analytically representing the high-dimensional capture(escape)regions characterized by initial positions of the pursuers and evaders, a deep contrastive learning algorithm based on the MADDPG network is built to indirectly represent the high-dimensional capture (escape) regions through the construction and training of Siamese Network. Simulation results show that the Nash equilibrium solution of the pursuit-evasion game of UAVs under given conditions can be gotten by the MADDPG algorithm, and the accuracy rate of representing high-dimensional capture(escape)regions achieves 95% by the combination of contrastive learning algorithm and the converged MADDPG network. © 2024 Chinese Society of Astronautics. All rights reserved.

引用

页码：262 / 272

页数：10

共 21 条

[1] ZHOU Xinmin, WU Jiahui, JIA Shengde, Et al., Progress in research on combat decision-making technology in UAVs[J], National Defense Technology, 42, 3, pp. 33-41, (2021)
[2] CONWAY B A, PONTANI M., Numerical solution of the three-dimensional orbital pursuit-evasion game[J], Journal of Guidance, Control, and Dynamics, 32, 2, pp. 474-487, (2009)
[3] CHANG Yan, CHEN Yun, XIAN Yong, Et al., Differential game guidance for space rendezvous of maneuvering target [J], Journal of Astronautics, 37, 7, pp. 795-801, (2016)
[4] ZENG X, YANG L P, ZHU Y W, Et al., Comparison of two optimal guidance methods for the long-distance orbital pursuit-evasion game [J], IEEE Transactions on Aerospace and Electronic Systems, 57, 1, pp. 521-539, (2021)
[5] WU S., Linear-quadratic non-zero sum backward stochastic differential game with overlapping information [J], IEEE Transactions on Automatic Control, 68, 3, pp. 1800-1806, (2023)
[6] WANG G C, YU Z Y., A Pontryagin’s maximum principle for non-zero sum differential games of BSDEs with applications[J], IEEE Transactions on Automatic Control, 55, 7, pp. 1742-1747, (2010)
[7] ZHANG W, SONG K, RONG X W, Et al., Coarse-to-fine UAV target tracking with deep reinforcement learning[J], IEEE Transactions on Automation Science and Engineering, 16, 4, pp. 1522-1530, (2019)
[8] ZENG X, ZHU Y W, YANG L P, Et al., A guidance method for coplanar orbital interception based on reinforcement learning [J], Journal of Systems Engineering and Electronics, 32, 4, pp. 927-938, (2021)
[9] PEI Pei, HE Shaoming, WANG Jiang, Et al., Integrated guidance and control for missile using deep reinforcement learning[J], Journal of Astronautics, 42, 10, pp. 1293-1304, (2021)
[10] PENG C, ZHANG H W, HE Y X, Et al., State-following-kernel-based online reinforcement learning guidance law against maneuvering target[J], IEEE Transactions on Aerospace and Electronic Systems, 58, 6, pp. 5784-5797, (2022)

← 1 2 3 →