Addressing Delays in Reinforcement Learning via Delayed Adversarial Imitation Learning

被引：2

作者：

Xie, Minzhi ^{[1
]}

Xia, Bo ^{[1
]}

Yu, Yalou ^{[1
]}

Wang, Xueqian ^{[1
]}

Chang, Yongzhe ^{[1
]}

机构：

[1] Tsinghua Univ, Shenzhen Int Grad Sch, Shenzhen 518000, Peoples R China

来源：

ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT III | 2023年 / 14256卷

关键词：

Reinforcement Learning; Delays; Adversarial Imitation Learning;

D O I：

10.1007/978-3-031-44213-1_23

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Observation and action delays occur commonly in many real-world tasks which violate Markov property and consequently degrade the performance of Reinforcement Learning methods. So far, there have been several efforts on delays in RL. Model-based methods train forward models to predict unknown current information while model-free approaches focus on state-augmentation to define new Markov Decision Processes. However, previous works suffer from difficult model fine-tuning and the curse of dimensionality that prevent them from solving delays. Motivated by the advantage of imitation learning, a novel idea is introduced that a delayed policy can be trained by imitating undelayed expert demonstrations. Based on the idea, we propose an algorithm named Delayed Adversarial Imitation Learning (DAIL). In DAIL, a few undelayed expert demonstrations are utilized to generate a surrogate delayed expert and a delayed policy is trained by imitating the surrogate expert using adversarial imitation learning. Moreover, a theoretical analysis of DAIL is presented to validate the rationality of DAIL and guide the practical design of the approach. Finally, experiments on continuous control tasks demonstrate that DAIL achieves much higher performance than previous approaches in solving delays in RL, where DAIL can converge to high performance with an excellent sample efficiency, even for substantial delays, while previous works cannot due to the divergence problems.

引用

页码：271 / 282

页数：12

共 50 条

[1] Delayed Reinforcement Learning by Imitation
Liotet, Pierre
Maran, Davide
Bisi, Lorenzo
Restelli, Marcello
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[2] Adversarial Imitation Learning via Random Search
Shin, MyungJae
Kim, Joongheon
2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
[3] Addressing implicit bias in adversarial imitation learning with mutual information
Zhang, Lihua
Liu, Quan
Zhu, Fei
Huang, Zhigang
NEURAL NETWORKS, 2023, 167 : 847 - 864
[4] Methodologies for Imitation Learning via Inverse Reinforcement Learning: A Review
Zhang K.
Yu Y.
Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2019, 56 (02): : 254 - 261
[5] Developing multi-agent adversarial environment using reinforcement learning and imitation learning
Ziyao Han
Yupeng Liang
Kazuhiro Ohkura
Artificial Life and Robotics, 2023, 28 : 703 - 709
[6] Developing multi-agent adversarial environment using reinforcement learning and imitation learning
Han, Ziyao
Liang, Yupeng
Ohkura, Kazuhiro
ARTIFICIAL LIFE AND ROBOTICS, 2023, 28 (04) : 703 - 709
[7] Multimodal Storytelling via Generative Adversarial Imitation Learning
Chen, Zhiqian
Zhang, Xuchao
Boedihardjo, Arnold P.
Dai, Jing
Lu, Chang-Tien
PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 3967 - 3973
[8] Imitation and reinforcement learning
Kober J.
Peters J.
IEEE Robotics and Automation Magazine, 2010, 17 (02): : 55 - 62
[9] Generative Adversarial Imitation Learning
Ho, Jonathan
Ermon, Stefano
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
[10] UAV Control Method Combining Reptile Meta-Reinforcement Learning and Generative Adversarial Imitation Learning
Jiang, Shui
Ge, Yanning
Yang, Xu
Yang, Wencheng
Cui, Hui
FUTURE INTERNET, 2024, 16 (03)

← 1 2 3 4 5 →