A State-Compensated Deep Deterministic Policy Gradient Algorithm for UAV Trajectory Tracking

被引:2
|
作者
Wu, Jiying [1 ]
Yang, Zhong [1 ]
Liao, Luwei [1 ]
He, Naifeng [1 ]
Wang, Zhiyong [1 ]
Wang, Can [1 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Automat Engn, Nanjing 211106, Peoples R China
关键词
trajectory tracking; deep reinforcement learning; deep deterministic policy gradient algorithm; state compensation network; REINFORCEMENT; QUADROTOR;
D O I
10.3390/machines10070496
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The unmanned aerial vehicle (UAV) trajectory tracking control algorithm based on deep reinforcement learning is generally inefficient for training in an unknown environment, and the convergence is unstable. Aiming at this situation, a Markov decision process (MDP) model for UAV trajectory tracking is established, and a state-compensated deep deterministic policy gradient (CDDPG) algorithm is proposed. An additional neural network (C-Net) whose input is compensation state and output is compensation action is added to the network model of a deep deterministic policy gradient (DDPG) algorithm to assist in network exploration training. It combined the action output of the DDPG network with compensated output of the C-Net as the output action to interact with the environment, enabling the UAV to rapidly track dynamic targets in the most accurate continuous and smooth way possible. In addition, random noise is added on the basis of the generated behavior to realize a certain range of exploration and make the action value estimation more accurate. The OpenAI Gym tool is used to verify the proposed method, and the simulation results show that: (1) The proposed method can significantly improve the training efficiency by adding a compensation network and effectively improve the accuracy and convergence stability; (2) Under the same computer configuration, the computational cost of the proposed algorithm is basically the same as that of the QAC algorithm (Actor-critic algorithm based on behavioral value Q) and the DDPG algorithm; (3) During the training process, with the same tracking accuracy, the learning efficiency is about 70% higher than that of QAC and DDPG; (4) During the simulation tracking experiment, under the same training time, the tracking error of the proposed method after stabilization is about 50% lower than that of QAC and DDPG.
引用
下载
收藏
页数:18
相关论文
共 50 条
  • [31] Development of a Deep Deterministic Policy Gradient (DDPG) Algorithm for Suturing Task Automation
    Imperato, Antonella
    Caianiello, Marco
    Ficuciello, Fanny
    2023 21ST INTERNATIONAL CONFERENCE ON ADVANCED ROBOTICS, ICAR, 2023, : 297 - 302
  • [32] Intelligent Thermal Control Algorithm Based on Deep Deterministic Policy Gradient for Spacecraft
    Xiong, Yan
    Guo, Liang
    Wang, Hongliang
    Huang, Yong
    Liu, Chunlong
    JOURNAL OF THERMOPHYSICS AND HEAT TRANSFER, 2020, 34 (04) : 683 - 695
  • [33] A dosing strategy model of deep deterministic policy gradient algorithm for sepsis patients
    Lin, Tianlai
    Zhang, Xinjue
    Gong, Jianbing
    Tan, Rundong
    Li, Weiming
    Wang, Lijun
    Pan, Yingxia
    Xu, Xiang
    Gao, Junhui
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2023, 23 (01)
  • [34] Research on Maneuvering Decision Algorithm Based on Improved Deep Deterministic Policy Gradient
    Jing, Xianyong
    Hou, Manyi
    Wu, Gaolong
    Ma, Zongcheng
    Tao, Zhongxiang
    IEEE ACCESS, 2022, 10 : 92426 - 92445
  • [35] An enhanced deep deterministic policy gradient algorithm for intelligent control of robotic arms
    Dong, Ruyi
    Du, Junjie
    Liu, Yanan
    Heidari, Ali Asghar
    Chen, Huiling
    FRONTIERS IN NEUROINFORMATICS, 2023, 17
  • [36] Twin-Delayed Deep Deterministic Policy Gradient Algorithm for Portfolio Selection
    Baard, Nicholas
    van Zyl, Terence L.
    2022 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE FOR FINANCIAL ENGINEERING AND ECONOMICS (CIFER), 2022,
  • [37] Control Method for PEMFC Using Improved Deep Deterministic Policy Gradient Algorithm
    Li, Jiawen
    Li, Yaping
    Yu, Tao
    FRONTIERS IN ENERGY RESEARCH, 2021, 9
  • [38] Reward adaptive wind power tracking control based on deep deterministic policy gradient
    Chen, Peng
    Han, Dezhi
    APPLIED ENERGY, 2023, 348
  • [39] A Deep Deterministic Policy Gradient Approach for Vehicle Speed Tracking Control With a Robotic Driver
    Hao, Gaofeng
    Fu, Zhuang
    Feng, Xin
    Gong, Zening
    Chen, Peng
    Wang, Dan
    Wang, Weibin
    Si, Yang
    IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2022, 19 (03) : 2514 - 2525
  • [40] Multi-Agent Distributed Deep Deterministic Policy Gradient for Partially Observable Tracking
    Fan, Dongyu
    Shen, Haikuo
    Dong, Lijing
    ACTUATORS, 2021, 10 (10)