A Continuous Actor-Critic Reinforcement Learning Approach to Flocking with Fixed-Wing UAVs

被引：0

作者：

Wang, Chang ^{[1
]}

Yan, Chao ^{[1
]}

Xiang, Xiaojia ^{[1
]}

Zhou, Han ^{[1
]}

机构：

[1] Natl Univ Def Technol, Changsha, Peoples R China

来源：

ASIAN CONFERENCE ON MACHINE LEARNING, VOL 101 | 2019年 / 101卷

基金：

中国国家自然科学基金;

关键词：

unmanned aerial vehicle (UAV); flocking; reinforcement learning; actor-critic; experience replay; AGENTS;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Controlling a squad of fixed-wing UAVs is challenging due to the kinematics complexity and the environmental dynamics. In this paper, we develop a novel actor-critic reinforcement learning approach to solve the leader-follower flocking problem in continuous state and action spaces. Specifically, we propose a CACER algorithm that uses multilayer perceptron to represent both the actor and the critic, which has a deeper structure and provides a better function approximator than the original continuous actor-critic learning automation (CACLA) algorithm. Besides, we propose a double prioritized experience replay (DPER) mechanism to further improve the training efficiency. Specifically, the state transition samples are saved into two different experience replay buffers for updating the actor and the critic separately, based on the calculation of sample priority using the temporal difference errors. We have not only compared CACER with CACLA and a benchmark deep reinforcement learning algorithm DDPG in numerical simulation, but also demonstrated the performance of CACER in semi-physical simulation by transferring the learned policy in the numerical simulation without parameter tuning.

引用

页码：64 / 79

页数：16

共 50 条

[1] Fixed-Wing UAVs flocking in continuous spaces: A deep reinforcement learning approach
Yan, Chao
Xiang, Xiaojia
Wang, Chang
[J]. ROBOTICS AND AUTONOMOUS SYSTEMS, 2020, 131
[2] Flocking and Collision Avoidance for a Dynamic Squad of Fixed-Wing UAVs Using Deep Reinforcement Learning
Yan, Chao
Xiang, Xiaojia
Wang, Chang
Lan, Zhen
[J]. 2021 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2021, : 4738 - 4744
[3] Deep Reinforcement Learning Attitude Control of Fixed-Wing UAVs
Zhen, Yan
Hao, Mingrui
Sun, Wendi
[J]. PROCEEDINGS OF 2020 3RD INTERNATIONAL CONFERENCE ON UNMANNED SYSTEMS (ICUS), 2020, : 239 - 244
[4] Flocking with Fixed-Wing UAVs for Distributed Sensing: A Stochastic Optimal Control Approach
Quintero, Steven A. P.
Collins, Gaemus E.
Hespanha, Joao P.
[J]. 2013 AMERICAN CONTROL CONFERENCE (ACC), 2013, : 2025 - 2031
[5] A World Model for Actor-Critic in Reinforcement Learning
Panov, A. I.
Ugadiarov, L. A.
[J]. PATTERN RECOGNITION AND IMAGE ANALYSIS, 2023, 33 (03) : 467 - 477
[6] Curious Hierarchical Actor-Critic Reinforcement Learning
Roeder, Frank
Eppe, Manfred
Nguyen, Phuong D. H.
Wermter, Stefan
[J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2020, PT II, 2020, 12397 : 408 - 419
[7] Actor-Critic based Improper Reinforcement Learning
Zaki, Mohammadi
Mohan, Avinash
Gopalan, Aditya
Mannor, Shie
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[8] Integrated Actor-Critic for Deep Reinforcement Learning
Zheng, Jiaohao
Kurt, Mehmet Necip
Wang, Xiaodong
[J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT IV, 2021, 12894 : 505 - 518
[9] A fuzzy Actor-Critic reinforcement learning network
Wang, Xue-Song
Cheng, Yu-Hu
Yi, Jian-Qiang
[J]. INFORMATION SCIENCES, 2007, 177 (18) : 3764 - 3781
[10] A modified actor-critic reinforcement learning algorithm
Mustapha, SM
Lachiver, G
[J]. 2000 CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, CONFERENCE PROCEEDINGS, VOLS 1 AND 2: NAVIGATING TO A NEW ERA, 2000, : 605 - 609

← 1 2 3 4 5 →