A State-Compensated Deep Deterministic Policy Gradient Algorithm for UAV Trajectory Tracking

被引：2

作者：

Wu, Jiying ^{[1
]}

Yang, Zhong ^{[1
]}

Liao, Luwei ^{[1
]}

He, Naifeng ^{[1
]}

Wang, Zhiyong ^{[1
]}

Wang, Can ^{[1
]}

机构：

[1] Nanjing Univ Aeronaut & Astronaut, Coll Automat Engn, Nanjing 211106, Peoples R China

来源：

MACHINES | 2022年 / 10卷 / 07期

关键词：

trajectory tracking; deep reinforcement learning; deep deterministic policy gradient algorithm; state compensation network; REINFORCEMENT; QUADROTOR;

D O I：

10.3390/machines10070496

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

The unmanned aerial vehicle (UAV) trajectory tracking control algorithm based on deep reinforcement learning is generally inefficient for training in an unknown environment, and the convergence is unstable. Aiming at this situation, a Markov decision process (MDP) model for UAV trajectory tracking is established, and a state-compensated deep deterministic policy gradient (CDDPG) algorithm is proposed. An additional neural network (C-Net) whose input is compensation state and output is compensation action is added to the network model of a deep deterministic policy gradient (DDPG) algorithm to assist in network exploration training. It combined the action output of the DDPG network with compensated output of the C-Net as the output action to interact with the environment, enabling the UAV to rapidly track dynamic targets in the most accurate continuous and smooth way possible. In addition, random noise is added on the basis of the generated behavior to realize a certain range of exploration and make the action value estimation more accurate. The OpenAI Gym tool is used to verify the proposed method, and the simulation results show that: (1) The proposed method can significantly improve the training efficiency by adding a compensation network and effectively improve the accuracy and convergence stability; (2) Under the same computer configuration, the computational cost of the proposed algorithm is basically the same as that of the QAC algorithm (Actor-critic algorithm based on behavioral value Q) and the DDPG algorithm; (3) During the training process, with the same tracking accuracy, the learning efficiency is about 70% higher than that of QAC and DDPG; (4) During the simulation tracking experiment, under the same training time, the tracking error of the proposed method after stabilization is about 50% lower than that of QAC and DDPG.

引用

下载

页数：18

共 50 条

[31] Development of a Deep Deterministic Policy Gradient (DDPG) Algorithm for Suturing Task Automation
Imperato, Antonella
Caianiello, Marco
Ficuciello, Fanny
2023 21ST INTERNATIONAL CONFERENCE ON ADVANCED ROBOTICS, ICAR, 2023, : 297 - 302
[32] Intelligent Thermal Control Algorithm Based on Deep Deterministic Policy Gradient for Spacecraft
Xiong, Yan
Guo, Liang
Wang, Hongliang
Huang, Yong
Liu, Chunlong
JOURNAL OF THERMOPHYSICS AND HEAT TRANSFER, 2020, 34 (04) : 683 - 695
[33] A dosing strategy model of deep deterministic policy gradient algorithm for sepsis patients
Lin, Tianlai
Zhang, Xinjue
Gong, Jianbing
Tan, Rundong
Li, Weiming
Wang, Lijun
Pan, Yingxia
Xu, Xiang
Gao, Junhui
BMC MEDICAL INFORMATICS AND DECISION MAKING, 2023, 23 (01)
[34] Research on Maneuvering Decision Algorithm Based on Improved Deep Deterministic Policy Gradient
Jing, Xianyong
Hou, Manyi
Wu, Gaolong
Ma, Zongcheng
Tao, Zhongxiang
IEEE ACCESS, 2022, 10 : 92426 - 92445
[35] An enhanced deep deterministic policy gradient algorithm for intelligent control of robotic arms
Dong, Ruyi
Du, Junjie
Liu, Yanan
Heidari, Ali Asghar
Chen, Huiling
FRONTIERS IN NEUROINFORMATICS, 2023, 17
[36] Twin-Delayed Deep Deterministic Policy Gradient Algorithm for Portfolio Selection
Baard, Nicholas
van Zyl, Terence L.
2022 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE FOR FINANCIAL ENGINEERING AND ECONOMICS (CIFER), 2022,
[37] Control Method for PEMFC Using Improved Deep Deterministic Policy Gradient Algorithm
Li, Jiawen
Li, Yaping
Yu, Tao
FRONTIERS IN ENERGY RESEARCH, 2021, 9
[38] Reward adaptive wind power tracking control based on deep deterministic policy gradient
Chen, Peng
Han, Dezhi
APPLIED ENERGY, 2023, 348
[39] A Deep Deterministic Policy Gradient Approach for Vehicle Speed Tracking Control With a Robotic Driver
Hao, Gaofeng
Fu, Zhuang
Feng, Xin
Gong, Zening
Chen, Peng
Wang, Dan
Wang, Weibin
Si, Yang
IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2022, 19 (03) : 2514 - 2525
[40] Multi-Agent Distributed Deep Deterministic Policy Gradient for Partially Observable Tracking
Fan, Dongyu
Shen, Haikuo
Dong, Lijing
ACTUATORS, 2021, 10 (10)

← 1 2 3 4 5 →