Cooperative decision-making algorithm with efficient convergence for UCAV formation in beyond-visual-range air combat based on multi-agent reinforcement learning

被引：1

作者：

Zhou, Yaoming ^{[1
]}

Yang, Fan ^{[1
]}

Zhang, Chaoyue ^{[1
]}

Li, Shida ^{[1
]}

Wang, Yongchao ^{[2
]}

机构：

[1] Beihang Univ, Sch Aeronaut Sci & Engn, Beijing 100191, Peoples R China

[2] Zhejiang Univ, Inst Cyber Syst & Control, Key Lab Ind Control Technol, Hangzhou 310027, Peoples R China

来源：

CHINESE JOURNAL OF AERONAUTICS | 2024年 / 37卷 / 08期

基金：

中国国家自然科学基金;

关键词：

Unmanned combat aerial vehicle (UCAV) formation; Decision-making; Beyond-visual-range (BVR) air combat; Advantage highlight; Multi-agent reinforcement learning (MARL);

D O I：

10.1016/j.cja.2024.04.008

中图分类号：

V [航空、航天];

学科分类号：

08 ; 0825 ;

摘要：

Highly intelligent Unmanned Combat Aerial Vehicle (UCAV) formation is expected to bring out strengths in Beyond-Visual-Range (BVR) air combat. Although Multi-Agent Reinforcement Learning (MARL) shows outstanding performance in cooperative decision-making, it is challenging for existing MARL algorithms to quickly converge to an optimal strategy for UCAV formation in BVR air combat where confrontation is complicated and reward is extremely sparse and delayed. Aiming to solve this problem, this paper proposes an Advantage Highlight MultiAgent Proximal Policy Optimization (AHMAPPO) algorithm. First, at every step, the AHMAPPO records the degree to which the best formation exceeds the average of formations in parallel environments and carries out additional advantage sampling according to it. Then, the sampling result is introduced into the updating process of the actor network to improve its optimization efficiency. Finally, the simulation results reveal that compared with some state-of-the-art MARL algorithms, the AHMAPPO can obtain a more excellent strategy utilizing fewer sample episodes in the UCAV formation BVR air combat simulation environment built in this paper, which can reflect the critical features of BVR air combat. The AHMAPPO can significantly increase the convergence efficiency

引用

页码：311 / 328

页数：18

共 50 条

[1] Cooperative decision-making algorithm with beyond-visual-range air combat based on multi-agent reinforcement learning
Yaoming ZHOU
Fan YANG
Chaoyue ZHANG
Shida LI
Yongchao WANG
Chinese Journal of Aeronautics, 2024, 37 (08) : 311 - 328
[2] A Multi-UCAV Cooperative Decision-Making Method Based on an MAPPO Algorithm for Beyond-Visual-Range Air Combat
Liu, Xiaoxiong
Yin, Yi
Su, Yuzhan
Ming, Ruichen
AEROSPACE, 2022, 9 (10)
[3] A Multi-UCAV cooperative occupation method based on weapon engagement zones for beyond-visual-range air combat
Wei-hua Li
Jing-ping Shi
Yun-yan Wu
Yue-ping Wang
Yong-xi Lyu
Defence Technology, 2022, 18 (06) : 1006 - 1022
[4] A Multi-UCAV cooperative occupation method based on weapon engagement zones for beyond-visual-range air combat
Li, Wei-hua
Shi, Jing-ping
Wu, Yun-yan
Wang, Yue-ping
Lyu, Yong-xi
DEFENCE TECHNOLOGY, 2022, 18 (06) : 1006 - 1022
[5] Cooperative Occupancy Decision Making of Multi-UAV in Beyond-Visual-Range Air Combat: A Game Theory Approach
Ma, Yingying
Wang, Guoqiang
Hu, Xiaoxuan
Luo, He
Lei, Xing
IEEE ACCESS, 2020, 8 : 11624 - 11634
[6] UAV cooperative air combat maneuver decision based on multi-agent reinforcement learning
ZHANG Jiandong
YANG Qiming
SHI Guoqing
LU Yi
WU Yong
Journal of Systems Engineering and Electronics, 2021, 32 (06) : 1421 - 1438
[7] UAV cooperative air combat maneuver decision based on multi-agent reinforcement learning
Zhang Jiandong
Yang Qiming
Shi Guoqing
Lu Yi
Wu Yong
JOURNAL OF SYSTEMS ENGINEERING AND ELECTRONICS, 2021, 32 (06) : 1421 - 1438
[8] Evasive Maneuver Strategy for UCAV in Beyond-Visual-Range Air Combat Based on Hierarchical Multi-Objective Evolutionary Algorithm
Yang, Zhen
Zhou, Deyun
Piao, Haiyin
Zhang, Kai
Kong, Weiren
Pan, Qian
IEEE ACCESS, 2020, 8 : 46605 - 46623
[9] A cooperative jamming decision-making method based on multi-agent reinforcement learning
Bingchen Cai
Haoran Li
Naimin Zhang
Mingyu Cao
Han Yu
Autonomous Intelligent Systems, 5 (1):
[10] Maneuver Strategy Generation of UCAV for within Visual Range Air Combat Based on Multi-Agent Reinforcement Learning and Target Position Prediction
Kong, Weiren
Zhou, Deyun
Yang, Zhen
Zhang, Kai
Zeng, Lina
APPLIED SCIENCES-BASEL, 2020, 10 (15):

← 1 2 3 4 5 →