A State-Compensated Deep Deterministic Policy Gradient Algorithm for UAV Trajectory Tracking

被引:2
|
作者
Wu, Jiying [1 ]
Yang, Zhong [1 ]
Liao, Luwei [1 ]
He, Naifeng [1 ]
Wang, Zhiyong [1 ]
Wang, Can [1 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Automat Engn, Nanjing 211106, Peoples R China
关键词
trajectory tracking; deep reinforcement learning; deep deterministic policy gradient algorithm; state compensation network; REINFORCEMENT; QUADROTOR;
D O I
10.3390/machines10070496
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The unmanned aerial vehicle (UAV) trajectory tracking control algorithm based on deep reinforcement learning is generally inefficient for training in an unknown environment, and the convergence is unstable. Aiming at this situation, a Markov decision process (MDP) model for UAV trajectory tracking is established, and a state-compensated deep deterministic policy gradient (CDDPG) algorithm is proposed. An additional neural network (C-Net) whose input is compensation state and output is compensation action is added to the network model of a deep deterministic policy gradient (DDPG) algorithm to assist in network exploration training. It combined the action output of the DDPG network with compensated output of the C-Net as the output action to interact with the environment, enabling the UAV to rapidly track dynamic targets in the most accurate continuous and smooth way possible. In addition, random noise is added on the basis of the generated behavior to realize a certain range of exploration and make the action value estimation more accurate. The OpenAI Gym tool is used to verify the proposed method, and the simulation results show that: (1) The proposed method can significantly improve the training efficiency by adding a compensation network and effectively improve the accuracy and convergence stability; (2) Under the same computer configuration, the computational cost of the proposed algorithm is basically the same as that of the QAC algorithm (Actor-critic algorithm based on behavioral value Q) and the DDPG algorithm; (3) During the training process, with the same tracking accuracy, the learning efficiency is about 70% higher than that of QAC and DDPG; (4) During the simulation tracking experiment, under the same training time, the tracking error of the proposed method after stabilization is about 50% lower than that of QAC and DDPG.
引用
收藏
页数:18
相关论文
共 50 条
  • [41] Mutual Deep Deterministic Policy Gradient Learning
    Sun, Zhou
    2022 INTERNATIONAL CONFERENCE ON BIG DATA, INFORMATION AND COMPUTER NETWORK (BDICN 2022), 2022, : 508 - 513
  • [42] Deep Deterministic Policy Gradient for Portfolio Management
    Khemlichi, Firdaous
    Chougrad, Hiba
    Khamlichi, Youness Idrissi
    El Boushaki, Abdessamad
    Ben Ali, Safae Elhaj
    2020 6TH IEEE CONGRESS ON INFORMATION SCIENCE AND TECHNOLOGY (IEEE CIST'20), 2020, : 424 - 429
  • [43] Regularly updated deterministic policy gradient algorithm
    Han, Shuai
    Zhou, Wenbo
    Lu, Shuai
    Yu, Jiayu
    KNOWLEDGE-BASED SYSTEMS, 2021, 214
  • [44] UAV′s air combat decision-making based on deep deterministic policy gradient and prediction
    Li Y.
    Lyu Y.
    Shi J.
    Li W.
    Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University, 2023, 41 (01): : 56 - 64
  • [45] Rank Selection Method of CP Decomposition Based on Deep Deterministic Policy Gradient Algorithm
    Zhang, Shaoshuang
    Li, Zhao
    Liu, Wenlong
    Zhao, Jiaqi
    Qin, Ting
    IEEE ACCESS, 2024, 12 : 97374 - 97385
  • [46] Deep Deterministic Policy Gradient Algorithm based Lateral and Longitudinal Control for Autonomous Driving
    Zhu Gongsheng
    Pei Chunmei
    Ding Jiang
    Shi Junfeng
    2020 5TH INTERNATIONAL CONFERENCE ON MECHANICAL, CONTROL AND COMPUTER ENGINEERING (ICMCCE 2020), 2020, : 736 - 741
  • [47] Heuristic Gait Learning of Quadruped Robot Based on Deep Deterministic Policy Gradient Algorithm
    Wang, Mingchao
    Ruan, Xiaogang
    Zhu, Xiaoqing
    2020 CHINESE AUTOMATION CONGRESS (CAC 2020), 2020, : 1046 - 1049
  • [48] Improved Deep Deterministic Policy Gradient Algorithm based on PER for Partial Task Offloading
    Mao, Jiahui
    Tan, Chong
    Liu, Hong
    Bian, Jichen
    Tang, Peiyao
    Zheng, Min
    2024 IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE, WCNC 2024, 2024,
  • [49] Virtual Network Function Migration Optimization Algorithm Based on Deep Deterministic Policy Gradient
    Tang Lun
    He Lanqin
    Tan Qi
    Chen Qianbin
    JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2021, 43 (02) : 404 - 411
  • [50] Coordinated Optimization of Active Distribution Network Based on Deep Deterministic Policy Gradient Algorithm
    Gong J.
    Liu Y.
    Dianli Xitong Zidonghua/Automation of Electric Power Systems, 2020, 44 (06): : 113 - 120