Multiple-UAV Reinforcement Learning Algorithm Based on Improved PPO in Ray Framework

被引:16
|
作者
Zhan, Guang [1 ]
Zhang, Xinmiao [2 ]
Li, Zhongchao [3 ]
Xu, Lin [2 ]
Zhou, Deyun [1 ]
Yang, Zhen [1 ]
机构
[1] Northwestern Polytech Univ, Sch Elect & Informat, Xian 710072, Peoples R China
[2] Northeastern Univ, Coll Informat Sci & Engn, Shenyang 110004, Peoples R China
[3] Aviat Ind Corp China, Shenyang Aircraft Design & Res Inst, Shenyang 110035, Peoples R China
关键词
multiple UAVs; deep reinforcement learning; PPO; curriculum learning; Ray; NAVIGATION;
D O I
10.3390/drones6070166
中图分类号
TP7 [遥感技术];
学科分类号
081102 ; 0816 ; 081602 ; 083002 ; 1404 ;
摘要
Distributed multi-agent collaborative decision-making technology is the key to general artificial intelligence. This paper takes the self-developed Unity3D collaborative combat environment as the test scenario, setting a task that requires heterogeneous unmanned aerial vehicles (UAVs) to perform a distributed decision-making and complete cooperation task. Aiming at the problem of the traditional proximal policy optimization (PPO) algorithm's poor performance in the field of complex multi-agent collaboration scenarios based on the distributed training framework Ray, the Critic network in the PPO algorithm is improved to learn a centralized value function, and the muti-agent proximal policy optimization (MAPPO) algorithm is proposed. At the same time, the inheritance training method based on course learning is adopted to improve the generalization performance of the algorithm. In the experiment, MAPPO can obtain the highest average accumulate reward compared with other algorithms and can complete the task goal with the fewest steps after convergence, which fully demonstrates that the MAPPO algorithm outperforms the state-of-the-art.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Reinforcement Learning in Multiple-UAV Networks: Deployment and Movement Design
    Liu, Xiao
    Liu, Yuanwei
    Chen, Yue
    [J]. IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2019, 68 (08) : 8036 - 8049
  • [2] UAV path planning based on the improved PPO algorithm
    Qi, Chenyang
    Wu, Chengfu
    Lei, Lei
    Li, Xiaolu
    Cong, Peiyan
    [J]. 2022 ASIA CONFERENCE ON ADVANCED ROBOTICS, AUTOMATION, AND CONTROL ENGINEERING (ARACE 2022), 2022, : 193 - 199
  • [3] Teaching and Learning Virtual Strategy for the Navigation of Multiple-UAV
    Bonilla, Edison L.
    Rodriguez, Jacson J.
    Acosta, Julio F.
    Andaluz, Victor H.
    [J]. 2020 15TH IBERIAN CONFERENCE ON INFORMATION SYSTEMS AND TECHNOLOGIES (CISTI'2020), 2020,
  • [4] PPO-based Reinforcement Learning for UAV Navigation in Urban Environments
    Chikhaoui, Khalil
    Ghazzai, Hakim
    Massoud, Yehia
    [J]. 2022 IEEE 65TH INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS (MWSCAS 2022), 2022,
  • [5] Autonomous flying of drone based on ppo reinforcement learning algorithm
    Park, Sung Gwan
    Kim, Dong Hwan
    [J]. Journal of Institute of Control, Robotics and Systems, 2020, 26 (11) : 955 - 963
  • [6] A HEART FAILURE PREDICTION ALGORITHM BASED ON IMPROVED REINFORCEMENT LEARNING FRAMEWORK
    Zhang, Yijie
    Yang, Xiangbo
    [J]. JOURNAL OF MECHANICS IN MEDICINE AND BIOLOGY, 2024,
  • [7] Research on the algorithm of constant force grinding controller based on reinforcement learning PPO
    Zhang, Tie
    Yuan, Chao
    Zou, Yanbiao
    [J]. INTERNATIONAL JOURNAL OF ADVANCED MANUFACTURING TECHNOLOGY, 2023, 126 (7-8): : 2975 - 2988
  • [8] Research on the algorithm of constant force grinding controller based on reinforcement learning PPO
    Tie Zhang
    Chao Yuan
    Yanbiao Zou
    [J]. The International Journal of Advanced Manufacturing Technology, 2023, 126 : 2975 - 2988
  • [9] A Reinforcement Learning Based User Association Algorithm for UAV Networks
    Li, Qingzhi
    Ding, Ming
    Ma, Chuan
    Liu, Chang
    Lin, Zihuai
    Liang, Ying-Chang
    [J]. 2018 28TH INTERNATIONAL TELECOMMUNICATION NETWORKS AND APPLICATIONS CONFERENCE (ITNAC), 2018, : 211 - 216
  • [10] Reinforcement learning framework for UAV-based target localization applications
    Shurrab, Mohammed
    Mizouni, Rabeb
    Singh, Shakti
    Otrok, Hadi
    [J]. INTERNET OF THINGS, 2023, 23