Addressing Delays in Reinforcement Learning via Delayed Adversarial Imitation Learning

被引：2

作者：

Xie, Minzhi ^{[1
]}

Xia, Bo ^{[1
]}

Yu, Yalou ^{[1
]}

Wang, Xueqian ^{[1
]}

Chang, Yongzhe ^{[1
]}

机构：

[1] Tsinghua Univ, Shenzhen Int Grad Sch, Shenzhen 518000, Peoples R China

来源：

ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT III | 2023年 / 14256卷

关键词：

Reinforcement Learning; Delays; Adversarial Imitation Learning;

D O I：

10.1007/978-3-031-44213-1_23

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Observation and action delays occur commonly in many real-world tasks which violate Markov property and consequently degrade the performance of Reinforcement Learning methods. So far, there have been several efforts on delays in RL. Model-based methods train forward models to predict unknown current information while model-free approaches focus on state-augmentation to define new Markov Decision Processes. However, previous works suffer from difficult model fine-tuning and the curse of dimensionality that prevent them from solving delays. Motivated by the advantage of imitation learning, a novel idea is introduced that a delayed policy can be trained by imitating undelayed expert demonstrations. Based on the idea, we propose an algorithm named Delayed Adversarial Imitation Learning (DAIL). In DAIL, a few undelayed expert demonstrations are utilized to generate a surrogate delayed expert and a delayed policy is trained by imitating the surrogate expert using adversarial imitation learning. Moreover, a theoretical analysis of DAIL is presented to validate the rationality of DAIL and guide the practical design of the approach. Finally, experiments on continuous control tasks demonstrate that DAIL achieves much higher performance than previous approaches in solving delays in RL, where DAIL can converge to high performance with an excellent sample efficiency, even for substantial delays, while previous works cannot due to the divergence problems.

引用

页码：271 / 282

页数：12

共 50 条

[21] Robust Adversarial Imitation Learning via Adaptively-Selected Demonstrations
Wang, Yunke
Xu, Chang
Du, Bo
PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 3155 - 3161
[22] Sample-Efficient Imitation Learning via Generative Adversarial Nets
Blonde, Lionel
Kalousis, Alexandros
22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89
[23] Learning Agile Skills via Adversarial Imitation of Rough Partial Demonstrations
Li, Chenhao
Vlastelica, Marin
Blaes, Sebastian
Frey, Jonas
Grimminger, Felix
Martius, Georg
CONFERENCE ON ROBOT LEARNING, VOL 205, 2022, 205 : 342 - 352
[24] Learning for a Robot: Deep Reinforcement Learning, Imitation Learning, Transfer Learning
Hua, Jiang
Zeng, Liangcai
Li, Gongfa
Ju, Zhaojie
SENSORS, 2021, 21 (04) : 1 - 21
[25] Learning for a robot: Deep reinforcement learning, imitation learning, transfer learning
Hua, Jiang
Zeng, Liangcai
Li, Gongfa
Ju, Zhaojie
Sensors (Switzerland), 2021, 21 (04): : 1 - 21
[26] Deep Adversarial Imitation Reinforcement Learning for QoS-Aware Cloud Job Scheduling
Huang, Yifeng
Cheng, Long
Xue, Lianting
Liu, Cong
Li, Yuancheng
Li, Jianbin
Ward, Tomas
IEEE SYSTEMS JOURNAL, 2022, 16 (03): : 4232 - 4242
[27] Optimizing Crop Management with Reinforcement Learning and Imitation Learning
Tao, Ran
Zhao, Pan
Wu, Jing
Martin, Nicolas
Harrison, Matthew T.
Ferreira, Carla
Kalantari, Zahra
Hovakimyan, Naira
PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 6228 - 6236
[28] Learning to Drive Using Sparse Imitation Reinforcement Learning
Han, Yuci
Yilmaz, Alper
2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 3736 - 3742
[29] Robot Manipulation Learning Using Generative Adversarial Imitation Learning
Jabri, Mohamed Khalil
PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 4893 - 4894
[30] Robust Adversarial Reinforcement Learning
Pinto, Lerrel
Davidson, James
Sukthankar, Rahul
Gupta, Abhinav
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70

← 1 2 3 4 5 →