Forgetful experience replay in hierarchical reinforcement learning from expert demonstrations

被引:0
|
作者
Skrynnik, Alexey [1 ]
Staroverov, Aleksey [1 ,2 ]
Aitygulov, Ermek [2 ]
Aksenov, Kirill [2 ]
Davydov, Vasilii [3 ]
Panov, Aleksandr, I [1 ,2 ]
机构
[1] Artificial Intelligence Res Inst FRC CSC RAS, Moscow, Russia
[2] Moscow Inst Phys & Technol, Moscow, Russia
[3] Moscow Inst Aviat Technol, Moscow, Russia
基金
俄罗斯科学基金会;
关键词
Expert demonstrations; ForgER; Hierarchical reinforcement learning; Learning from demonstrations; Task-oriented augmentation; Goal-oriented reinforcement learning;
D O I
10.1016/j.knosys.2021.106844
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep reinforcement learning (RL) shows impressive results in complex gaming and robotic environments. These results are commonly achieved at the expense of huge computational costs and require an incredible number of episodes of interactions between the agent and the environment. Hierarchical methods and expert demonstrations are among the most promising approaches to improve the sample efficiency of reinforcement learning methods. In this paper, we propose a combination of methods that allow the agent to use low-quality demonstrations in complex vision-based environments with multiple related goals. Our Forgetful Experience Replay (ForgER) algorithm effectively handles expert data errors and reduces quality losses when adapting the action space and states representation to the agent's capabilities. The proposed goal-oriented replay buffer structure allows the agent to automatically highlight sub-goals for solving complex hierarchical tasks in demonstrations. Our method has a high degree of versatility and can be integrated into various off-policy methods. The ForgER surpasses the existing state-of-the-art RL methods using expert demonstrations in complex environments. The solution based on our algorithm beats other solutions for the famous MineRL competition and allows the agent to demonstrate the behavior at the expert level. (C) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Continuous Reinforcement Learning From Human Demonstrations With Integrated Experience Replay for Autonomous Driving
    Zuo, Sixiang
    Wang, Zhiyang
    Zhu, Xiaorui
    Ou, Yongsheng
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND BIOMIMETICS (IEEE ROBIO 2017), 2017, : 2450 - 2455
  • [2] Knowledge Transfer for Deep Reinforcement Learning with Hierarchical Experience Replay
    Yin, Haiyan
    Pan, Sinno Jialin
    [J]. THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 1640 - 1646
  • [3] Autonomous reinforcement learning with experience replay
    Wawrzynski, Pawel
    Tanwani, Ajay Kumar
    [J]. NEURAL NETWORKS, 2013, 41 : 156 - 167
  • [4] On Pathologies in KL-Regularized Reinforcement Learning from Expert Demonstrations
    Rudner, Tim G. J.
    Lu, Cong
    Osborne, Michael A.
    Gal, Yarin
    Teh, Yee Whye
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [5] Reinforcement learning from expert demonstrations with application to redundant robot control
    Ramirez, Jorge
    Yu, Wen
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 119
  • [6] Reinforcement Learning from Imperfect Demonstrations under Soft Expert Guidance
    Jing, Mingxuan
    Ma, Xiaojian
    Huang, Wenbing
    Sun, Fuchun
    Yang, Chao
    Fang, Bin
    Liu, Huaping
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 5109 - 5116
  • [7] Model-free reinforcement learning from expert demonstrations: a survey
    Ramirez, Jorge
    Yu, Wen
    Perrusquia, Adolfo
    [J]. ARTIFICIAL INTELLIGENCE REVIEW, 2022, 55 (04) : 3213 - 3241
  • [8] Model-free reinforcement learning from expert demonstrations: a survey
    Jorge Ramírez
    Wen Yu
    Adolfo Perrusquía
    [J]. Artificial Intelligence Review, 2022, 55 : 3213 - 3241
  • [9] SELECTIVE EXPERIENCE REPLAY IN REINFORCEMENT LEARNING FOR REIDENTIFICATION
    Thakoor, Ninad
    Bhanu, Bir
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2016, : 4250 - 4254
  • [10] Efficient experience replay architecture for offline reinforcement learning
    Zhang, Longfei
    Feng, Yanghe
    Wang, Rongxiao
    Xu, Yue
    Xu, Naifu
    Liu, Zeyi
    Du, Hang
    [J]. ROBOTIC INTELLIGENCE AND AUTOMATION, 2023, 43 (01): : 35 - 43