Generative Adversarial Inverse Reinforcement Learning With Deep Deterministic Policy Gradient

被引：2

作者：

Zhan, Ming ^{[1
]}

Fan, Jingjing ^{[2
]}

Guo, Jianying ^{[3
]}

机构：

[1] North China Univ Technol, Sch Elect & Control Engn, Beijing 100144, Peoples R China

[2] North China Univ Technol, Intelligent Transportat Key Lab, Beijing 100144, Peoples R China

[3] Tianjin Vocat Inst, Tianjin 300000, Peoples R China

来源：

IEEE ACCESS | 2023年 / 11卷

关键词：

Inverse reinforcement learning; generative adversarial networks; deep deterministic policy gradient;

D O I：

10.1109/ACCESS.2023.3305453

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Although the issue of sparse expert samples at the early stage of training in inverse reinforcement learning (IRL) is successfully resolved by the introduction of generative adversarial network (GAN), the inherent drawbacks of GAN result in ineffective generated samples. Therefore, we propose an algorithm for generative adversarial inverse reinforcement learning that is based on deep deterministic policy gradient (DDPG). We use the deterministic strategy to replace the random noise input of the initial GAN model and reconstruct the generator of the GAN based on the Actor-Critic mechanism in order to improve the quality of GAN-generated samples during adversarial training. Meanwhile, we mix the GAN-generated virtual samples with the original expert samples of IRL as the expert sample set of IRL. Our approach not only solves the problem of sparse expert samples at the early stage of training, but most importantly, it makes the decision-making process of IRL occurring under GAN more efficient. In the subsequent IRL decision-making process, we also analyze the differences between the mixed expert samples and the non-expert trajectory samples generated by the initial strategy to determine the best reward function. The learned reward function is used to drive the RL process positively for policy updating and optimization, on which further non-expert trajectory samples are generated. By comparing the differences between the new non-expert samples and the mixed expert sample set, we hope to iteratively arrive at the reward function and optimal policy. Performance tests in the MuJoCo physical simulation environment and trajectory prediction experiments in Grid World show that our model improves the quality of GAN-generated samples and reduces the computational cost of the network training by approximately 20% for each given environment, applying to decision planning for autonomous driving.

引用

页码：87732 / 87746

页数：15

共 50 条

[1] Deep Ensemble Reinforcement Learning with Multiple Deep Deterministic Policy Gradient Algorithm
Wu, Junta
Li, Huiyun
[J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2020, 2020
[2] OptionGAN: Learning Joint Reward-Policy Options Using Generative Adversarial Inverse Reinforcement Learning
Henderson, Peter
Chang, Wei-Di
Bacon, Pierre-Luc
Meger, David
Pineau, Joelle
Precup, Doina
[J]. THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 3199 - 3206
[3] UAV Autonomous Aerial Combat Maneuver Strategy Generation with Observation Error Based on State-Adversarial Deep Deterministic Policy Gradient and Inverse Reinforcement Learning
Kong, Weiren
Zhou, Deyun
Yang, Zhen
Zhao, Yiyang
Zhang, Kai
[J]. ELECTRONICS, 2020, 9 (07): : 1 - 24
[4] Strategy Generation Based on Reinforcement Learning with Deep Deterministic Policy Gradient for UCAV
Ma, Yunhong
Bai, Shuyao
Zhao, Yifei
Song, Chao
Yang, Jie
[J]. 16TH IEEE INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION, ROBOTICS AND VISION (ICARCV 2020), 2020, : 789 - 794
[5] Reinforcement Learning for Mobile Robot Obstacle Avoidance with Deep Deterministic Policy Gradient
Chen, Miao
Li, Wenna
Fei, Shihan
Wei, Yufei
Tu, Mingyang
Li, Jiangbo
[J]. INTELLIGENT ROBOTICS AND APPLICATIONS (ICIRA 2022), PT III, 2022, 13457 : 197 - 204
[6] Generative Inverse Deep Reinforcement Learning for Online Recommendation
Chen, Xiaocong
Yao, Lina
Sun, Aixin
Wang, Xianzhi
Xu, Xiwei
Zhu, Liming
[J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 201 - 210
[7] Mutual Deep Deterministic Policy Gradient Learning
Sun, Zhou
[J]. 2022 INTERNATIONAL CONFERENCE ON BIG DATA, INFORMATION AND COMPUTER NETWORK (BDICN 2022), 2022, : 508 - 513
[8] Reinforcement Learning Control with Deep Deterministic Policy Gradient Algorithm for Multivariable pH Process
Panjapornpon, Chanin
Chinchalongporn, Patcharapol
Bardeeniz, Santi
Makkayatorn, Ratthanita
Wongpunnawat, Witchaya
[J]. PROCESSES, 2022, 10 (12)
[9] Deep deterministic policy gradient to regulate feedback control systems using reinforcement learning
Arshad, Jehangir
Khan, Ayesha
Aftab, Mariam
Hussain, Mujtaba
Rehman, Ateeq Ur
Ahmad, Shafiq
Al-Shayea, Adel M.
Shafiq, Muhammad
[J]. Computers, Materials and Continua, 2022, 71 (01): : 1153 - 1169
[10] Improvement of PMSM Control Using Reinforcement Learning Deep Deterministic Policy Gradient Agent
Nicola, Marcel
Nicola, Claudiu-Ionel
[J]. 2021 21ST INTERNATIONAL SYMPOSIUM ON POWER ELECTRONICS (EE 2021), 2021,

← 1 2 3 4 5 →