Addressing Delays in Reinforcement Learning via Delayed Adversarial Imitation Learning

被引:2
|
作者
Xie, Minzhi [1 ]
Xia, Bo [1 ]
Yu, Yalou [1 ]
Wang, Xueqian [1 ]
Chang, Yongzhe [1 ]
机构
[1] Tsinghua Univ, Shenzhen Int Grad Sch, Shenzhen 518000, Peoples R China
来源
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT III | 2023年 / 14256卷
关键词
Reinforcement Learning; Delays; Adversarial Imitation Learning;
D O I
10.1007/978-3-031-44213-1_23
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Observation and action delays occur commonly in many real-world tasks which violate Markov property and consequently degrade the performance of Reinforcement Learning methods. So far, there have been several efforts on delays in RL. Model-based methods train forward models to predict unknown current information while model-free approaches focus on state-augmentation to define new Markov Decision Processes. However, previous works suffer from difficult model fine-tuning and the curse of dimensionality that prevent them from solving delays. Motivated by the advantage of imitation learning, a novel idea is introduced that a delayed policy can be trained by imitating undelayed expert demonstrations. Based on the idea, we propose an algorithm named Delayed Adversarial Imitation Learning (DAIL). In DAIL, a few undelayed expert demonstrations are utilized to generate a surrogate delayed expert and a delayed policy is trained by imitating the surrogate expert using adversarial imitation learning. Moreover, a theoretical analysis of DAIL is presented to validate the rationality of DAIL and guide the practical design of the approach. Finally, experiments on continuous control tasks demonstrate that DAIL achieves much higher performance than previous approaches in solving delays in RL, where DAIL can converge to high performance with an excellent sample efficiency, even for substantial delays, while previous works cannot due to the divergence problems.
引用
收藏
页码:271 / 282
页数:12
相关论文
共 50 条
  • [31] Environment Adversarial Reinforcement Learning
    Cooper, John R.
    AIAA SCITECH 2024 FORUM, 2024,
  • [32] Robust Reinforcement Learning via Adversarial training with Langevin Dynamics
    Kamalaruban, Parameswaran
    Huang, Yu-Ting
    Hsieh, Ya-Ping
    Rolland, Paul
    Shi, Cheng
    Cevher, Volkan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [33] Model-Free μ Synthesis via Adversarial Reinforcement Learning
    Keivan, Darioush
    Havens, Aaron
    Seiler, Peter
    Dullerud, Geir
    Hu, Bin
    2022 AMERICAN CONTROL CONFERENCE, ACC, 2022, : 3335 - 3341
  • [34] Training reinforcement learning models via an adversarial evolutionary algorithm
    Coletti, Mark
    Gunaratne, Chathika
    Schuman, Catherine D.
    Patton, Robert
    51ST INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING WORKSHOPS PROCEEDINGS, ICPP 2022, 2022,
  • [35] Automated Antenna Design via Domain Knowledge-Informed Reinforcement Learning and Imitation Learning
    Wei, Zhaohui
    Zhou, Zhao
    Wang, Peng
    Ren, Jian
    Yin, Yingzeng
    Pedersen, Gert Frolund
    Shen, Ming
    IEEE TRANSACTIONS ON ANTENNAS AND PROPAGATION, 2023, 71 (07) : 5549 - 5557
  • [36] Efficient Lane-changing Behavior Planning via Reinforcement Learning with Imitation Learning Initialization
    Shi, Jiamin
    Zhang, Tangyike
    Zhan, Junxiang
    Chen, Shitao
    Xin, Jingmin
    Zheng, Nanning
    2023 IEEE INTELLIGENT VEHICLES SYMPOSIUM, IV, 2023,
  • [37] Relay Policy Learning: Solving Long-Horizon Tasks via Imitation and Reinforcement Learning
    Gupta, Abhishek
    Kumar, Vikash
    Lynch, Corey
    Levine, Sergey
    Hausman, Karol
    CONFERENCE ON ROBOT LEARNING, VOL 100, 2019, 100
  • [38] Real-time Decision Making for Power System via Imitation Learning and Reinforcement Learning
    Guo, Lei
    Guo, Jun
    Zhang, Yong
    Guo, Wanshu
    Xue, Yanjun
    Wang, Lipeng
    2022 IEEE/IAS INDUSTRIAL AND COMMERCIAL POWER SYSTEM ASIA (I&CPS ASIA 2022), 2022, : 744 - 748
  • [39] Implicit imitation in multiagent reinforcement learning
    Price, B
    Boutilier, C
    MACHINE LEARNING, PROCEEDINGS, 1999, : 325 - 334
  • [40] PRIMAL: Pathfinding via Reinforcement and Imitation Multi-Agent Learning
    Sartoretti, Guillaume
    Kerr, Justin
    Shi, YunFei
    Wagner, Glenn
    Kumar, T. K. Satish
    Koenig, Sven
    Choset, Howie
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2019, 4 (03): : 2378 - 2385