A State-Compensated Deep Deterministic Policy Gradient Algorithm for UAV Trajectory Tracking

被引:2
|
作者
Wu, Jiying [1 ]
Yang, Zhong [1 ]
Liao, Luwei [1 ]
He, Naifeng [1 ]
Wang, Zhiyong [1 ]
Wang, Can [1 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Automat Engn, Nanjing 211106, Peoples R China
关键词
trajectory tracking; deep reinforcement learning; deep deterministic policy gradient algorithm; state compensation network; REINFORCEMENT; QUADROTOR;
D O I
10.3390/machines10070496
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The unmanned aerial vehicle (UAV) trajectory tracking control algorithm based on deep reinforcement learning is generally inefficient for training in an unknown environment, and the convergence is unstable. Aiming at this situation, a Markov decision process (MDP) model for UAV trajectory tracking is established, and a state-compensated deep deterministic policy gradient (CDDPG) algorithm is proposed. An additional neural network (C-Net) whose input is compensation state and output is compensation action is added to the network model of a deep deterministic policy gradient (DDPG) algorithm to assist in network exploration training. It combined the action output of the DDPG network with compensated output of the C-Net as the output action to interact with the environment, enabling the UAV to rapidly track dynamic targets in the most accurate continuous and smooth way possible. In addition, random noise is added on the basis of the generated behavior to realize a certain range of exploration and make the action value estimation more accurate. The OpenAI Gym tool is used to verify the proposed method, and the simulation results show that: (1) The proposed method can significantly improve the training efficiency by adding a compensation network and effectively improve the accuracy and convergence stability; (2) Under the same computer configuration, the computational cost of the proposed algorithm is basically the same as that of the QAC algorithm (Actor-critic algorithm based on behavioral value Q) and the DDPG algorithm; (3) During the training process, with the same tracking accuracy, the learning efficiency is about 70% higher than that of QAC and DDPG; (4) During the simulation tracking experiment, under the same training time, the tracking error of the proposed method after stabilization is about 50% lower than that of QAC and DDPG.
引用
下载
收藏
页数:18
相关论文
共 50 条
  • [1] Deep deterministic policy gradient algorithm for UAV control
    Huang X.
    Liu J.
    Jia C.
    Wang Z.
    Zhang J.
    Hangkong Xuebao/Acta Aeronautica et Astronautica Sinica, 2021, 42 (11):
  • [2] Trajectory tracking of piezoelectric actuators using state-compensated iterative learning control
    Lee, Fu-Shin
    Chien, Chiang-Ju
    Wang, Jhen-Cheng
    JOURNAL OF INTELLIGENT MATERIAL SYSTEMS AND STRUCTURES, 2007, 18 (06) : 555 - 567
  • [3] Deep deterministic policy gradient based multi-UAV control for moving convoy tracking
    Garg, Armaan
    Jha, Shashi Shekhar
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 126
  • [4] Compensation Control of UAV Based on Deep Deterministic Policy Gradient
    Xu, Zijun
    Qi, Juntong
    Wang, Mingming
    Wu, Chong
    Yang, Guang
    2022 41ST CHINESE CONTROL CONFERENCE (CCC), 2022, : 2289 - 2296
  • [5] Deep deterministic policy gradient algorithm: A systematic review
    Sumiea, Ebrahim Hamid
    Abdulkadir, Said Jadid
    Alhussian, Hitham Seddig
    Al-Selwi, Safwan Mahmood
    Alqushaibi, Alawi
    Ragab, Mohammed Gamal
    Fati, Suliman Mohamed
    HELIYON, 2024, 10 (09)
  • [6] Target tracking strategy using deep deterministic policy gradient
    You, Shixun
    Diao, Ming
    Gao, Lipeng
    Zhang, Fulong
    Wang, Huan
    APPLIED SOFT COMPUTING, 2020, 95
  • [7] A deep deterministic policy gradient algorithm based on averaged state-action estimation
    Xu, Jian
    Zhang, Haifei
    Qiu, Jianlin
    COMPUTERS & ELECTRICAL ENGINEERING, 2022, 101
  • [8] Unmanned Aerial Vehicle Trajectory Planning and Power Control Algorithm Based on Deep Deterministic Policy Gradient
    Yang Q.
    Chen J.
    Peng Y.
    Beijing Youdian Daxue Xuebao/Journal of Beijing University of Posts and Telecommunications, 2023, 46 (03): : 43 - 48
  • [9] A model predictive control trajectory tracking lateral controller for autonomous vehicles combined with deep deterministic policy gradient
    Xie, Zhaokang
    Huang, Xiaoci
    Luo, Suyun
    Zhang, Ruoping
    Ma, Fang
    TRANSACTIONS OF THE INSTITUTE OF MEASUREMENT AND CONTROL, 2024, 46 (08) : 1507 - 1519
  • [10] Controlling Bicycle Using Deep Deterministic Policy Gradient Algorithm
    Le Pham Tuyen
    Chung, TaeChoong
    2017 14TH INTERNATIONAL CONFERENCE ON UBIQUITOUS ROBOTS AND AMBIENT INTELLIGENCE (URAI), 2017, : 413 - 417