A State-Compensated Deep Deterministic Policy Gradient Algorithm for UAV Trajectory Tracking

被引:2
|
作者
Wu, Jiying [1 ]
Yang, Zhong [1 ]
Liao, Luwei [1 ]
He, Naifeng [1 ]
Wang, Zhiyong [1 ]
Wang, Can [1 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Automat Engn, Nanjing 211106, Peoples R China
关键词
trajectory tracking; deep reinforcement learning; deep deterministic policy gradient algorithm; state compensation network; REINFORCEMENT; QUADROTOR;
D O I
10.3390/machines10070496
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The unmanned aerial vehicle (UAV) trajectory tracking control algorithm based on deep reinforcement learning is generally inefficient for training in an unknown environment, and the convergence is unstable. Aiming at this situation, a Markov decision process (MDP) model for UAV trajectory tracking is established, and a state-compensated deep deterministic policy gradient (CDDPG) algorithm is proposed. An additional neural network (C-Net) whose input is compensation state and output is compensation action is added to the network model of a deep deterministic policy gradient (DDPG) algorithm to assist in network exploration training. It combined the action output of the DDPG network with compensated output of the C-Net as the output action to interact with the environment, enabling the UAV to rapidly track dynamic targets in the most accurate continuous and smooth way possible. In addition, random noise is added on the basis of the generated behavior to realize a certain range of exploration and make the action value estimation more accurate. The OpenAI Gym tool is used to verify the proposed method, and the simulation results show that: (1) The proposed method can significantly improve the training efficiency by adding a compensation network and effectively improve the accuracy and convergence stability; (2) Under the same computer configuration, the computational cost of the proposed algorithm is basically the same as that of the QAC algorithm (Actor-critic algorithm based on behavioral value Q) and the DDPG algorithm; (3) During the training process, with the same tracking accuracy, the learning efficiency is about 70% higher than that of QAC and DDPG; (4) During the simulation tracking experiment, under the same training time, the tracking error of the proposed method after stabilization is about 50% lower than that of QAC and DDPG.
引用
下载
收藏
页数:18
相关论文
共 50 条
  • [21] UAV Path Planning Based on Multicritic-Delayed Deep Deterministic Policy Gradient
    Wu, Runjia
    Gu, Fangqing
    Liu, Hai-lin
    Shi, Hongjian
    WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2022, 2022
  • [22] Unmanned Surface Vehicle Course Tracking Control Based on Neural Network and Deep Deterministic Policy Gradient Algorithm
    Wang, Yan
    Tong, Jie
    Song, Tian-Yu
    Wan, Zhan-Hong
    2018 OCEANS - MTS/IEEE KOBE TECHNO-OCEANS (OTO), 2018,
  • [23] Path Tracking Control of Autonomous Ground Vehicles Via Model Predictive Control and Deep Deterministic Policy Gradient Algorithm
    Xue, Zhongjin
    Li, Liang
    Zhong, Zhihua
    Zhao, Jintao
    2021 32ND IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV), 2021, : 1220 - 1227
  • [24] End-to-end self-driving policy based on the deep deterministic policy gradient algorithm considering the state distribution
    Wang T.
    Luo Y.
    Liu J.
    Li K.
    Qinghua Daxue Xuebao/Journal of Tsinghua University, 2021, 61 (09): : 881 - 888
  • [25] Policy Space Noise in Deep Deterministic Policy Gradient
    Yan, Yan
    Liu, Quan
    NEURAL INFORMATION PROCESSING (ICONIP 2018), PT II, 2018, 11302 : 624 - 634
  • [26] UAV Coverage Path Planning With Quantum-Based Recurrent Deep Deterministic Policy Gradient
    Silvirianti
    Narottama, Bhaskara
    Shin, Soo Young
    IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2024, 73 (05) : 7424 - 7429
  • [27] Implementation of partially tuned PD controllers of a multirotor UAV using deep deterministic policy gradient
    Emmanuel Mosweu
    Tshepo Botho Seokolo
    Theddeus Tochukwu Akano
    Oboetswe Seraga Motsamai
    Journal of Electrical Systems and Information Technology, 11 (1)
  • [28] Deep Deterministic Policy Gradient-Based Algorithm for Computation Offloading in IoV
    Li, Haofei
    Chen, Chen
    Shan, Hangguan
    Li, Pu
    Chang, Yoong Choon
    Song, Houbing
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (03) : 2522 - 2533
  • [29] Deep deterministic policy gradient algorithm for crowd-evacuation path planning
    Li, Xinjin
    Liu, Hong
    Li, Junqing
    Li, Yan
    COMPUTERS & INDUSTRIAL ENGINEERING, 2021, 161
  • [30] A dosing strategy model of deep deterministic policy gradient algorithm for sepsis patients
    Tianlai Lin
    Xinjue Zhang
    Jianbing Gong
    Rundong Tan
    Weiming Li
    Lijun Wang
    Yingxia Pan
    Xiang Xu
    Junhui Gao
    BMC Medical Informatics and Decision Making, 23