Generative Adversarial Inverse Reinforcement Learning With Deep Deterministic Policy Gradient

被引:2
|
作者
Zhan, Ming [1 ]
Fan, Jingjing [2 ]
Guo, Jianying [3 ]
机构
[1] North China Univ Technol, Sch Elect & Control Engn, Beijing 100144, Peoples R China
[2] North China Univ Technol, Intelligent Transportat Key Lab, Beijing 100144, Peoples R China
[3] Tianjin Vocat Inst, Tianjin 300000, Peoples R China
关键词
Inverse reinforcement learning; generative adversarial networks; deep deterministic policy gradient;
D O I
10.1109/ACCESS.2023.3305453
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Although the issue of sparse expert samples at the early stage of training in inverse reinforcement learning (IRL) is successfully resolved by the introduction of generative adversarial network (GAN), the inherent drawbacks of GAN result in ineffective generated samples. Therefore, we propose an algorithm for generative adversarial inverse reinforcement learning that is based on deep deterministic policy gradient (DDPG). We use the deterministic strategy to replace the random noise input of the initial GAN model and reconstruct the generator of the GAN based on the Actor-Critic mechanism in order to improve the quality of GAN-generated samples during adversarial training. Meanwhile, we mix the GAN-generated virtual samples with the original expert samples of IRL as the expert sample set of IRL. Our approach not only solves the problem of sparse expert samples at the early stage of training, but most importantly, it makes the decision-making process of IRL occurring under GAN more efficient. In the subsequent IRL decision-making process, we also analyze the differences between the mixed expert samples and the non-expert trajectory samples generated by the initial strategy to determine the best reward function. The learned reward function is used to drive the RL process positively for policy updating and optimization, on which further non-expert trajectory samples are generated. By comparing the differences between the new non-expert samples and the mixed expert sample set, we hope to iteratively arrive at the reward function and optimal policy. Performance tests in the MuJoCo physical simulation environment and trajectory prediction experiments in Grid World show that our model improves the quality of GAN-generated samples and reduces the computational cost of the network training by approximately 20% for each given environment, applying to decision planning for autonomous driving.
引用
收藏
页码:87732 / 87746
页数:15
相关论文
共 50 条
  • [1] Deep Ensemble Reinforcement Learning with Multiple Deep Deterministic Policy Gradient Algorithm
    Wu, Junta
    Li, Huiyun
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2020, 2020
  • [2] OptionGAN: Learning Joint Reward-Policy Options Using Generative Adversarial Inverse Reinforcement Learning
    Henderson, Peter
    Chang, Wei-Di
    Bacon, Pierre-Luc
    Meger, David
    Pineau, Joelle
    Precup, Doina
    [J]. THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 3199 - 3206
  • [3] UAV Autonomous Aerial Combat Maneuver Strategy Generation with Observation Error Based on State-Adversarial Deep Deterministic Policy Gradient and Inverse Reinforcement Learning
    Kong, Weiren
    Zhou, Deyun
    Yang, Zhen
    Zhao, Yiyang
    Zhang, Kai
    [J]. ELECTRONICS, 2020, 9 (07): : 1 - 24
  • [4] Strategy Generation Based on Reinforcement Learning with Deep Deterministic Policy Gradient for UCAV
    Ma, Yunhong
    Bai, Shuyao
    Zhao, Yifei
    Song, Chao
    Yang, Jie
    [J]. 16TH IEEE INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION, ROBOTICS AND VISION (ICARCV 2020), 2020, : 789 - 794
  • [5] Reinforcement Learning for Mobile Robot Obstacle Avoidance with Deep Deterministic Policy Gradient
    Chen, Miao
    Li, Wenna
    Fei, Shihan
    Wei, Yufei
    Tu, Mingyang
    Li, Jiangbo
    [J]. INTELLIGENT ROBOTICS AND APPLICATIONS (ICIRA 2022), PT III, 2022, 13457 : 197 - 204
  • [6] Generative Inverse Deep Reinforcement Learning for Online Recommendation
    Chen, Xiaocong
    Yao, Lina
    Sun, Aixin
    Wang, Xianzhi
    Xu, Xiwei
    Zhu, Liming
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 201 - 210
  • [7] Mutual Deep Deterministic Policy Gradient Learning
    Sun, Zhou
    [J]. 2022 INTERNATIONAL CONFERENCE ON BIG DATA, INFORMATION AND COMPUTER NETWORK (BDICN 2022), 2022, : 508 - 513
  • [8] Reinforcement Learning Control with Deep Deterministic Policy Gradient Algorithm for Multivariable pH Process
    Panjapornpon, Chanin
    Chinchalongporn, Patcharapol
    Bardeeniz, Santi
    Makkayatorn, Ratthanita
    Wongpunnawat, Witchaya
    [J]. PROCESSES, 2022, 10 (12)
  • [9] Deep deterministic policy gradient to regulate feedback control systems using reinforcement learning
    Arshad, Jehangir
    Khan, Ayesha
    Aftab, Mariam
    Hussain, Mujtaba
    Rehman, Ateeq Ur
    Ahmad, Shafiq
    Al-Shayea, Adel M.
    Shafiq, Muhammad
    [J]. Computers, Materials and Continua, 2022, 71 (01): : 1153 - 1169
  • [10] Improvement of PMSM Control Using Reinforcement Learning Deep Deterministic Policy Gradient Agent
    Nicola, Marcel
    Nicola, Claudiu-Ionel
    [J]. 2021 21ST INTERNATIONAL SYMPOSIUM ON POWER ELECTRONICS (EE 2021), 2021,