Reinforcement Learning from Imperfect Demonstrations under Soft Expert Guidance

被引:0
|
作者
Jing, Mingxuan [1 ]
Ma, Xiaojian [1 ,2 ]
Huang, Wenbing [1 ]
Sun, Fuchun [1 ]
Yang, Chao [1 ]
Fang, Bin [1 ]
Liu, Huaping [1 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, Beijing Natl Res Ctr Informat Sci & Technol BNRis, State Key Lab Intelligent Technol & Syst, Beijing 100084, Peoples R China
[2] Univ Calif Los Angeles, Dept Comp Sci, Ctr Vis Cognit Learning & Auton, Los Angeles, CA 90095 USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper. we study Reinforcement Learning from Demonstrations (RLfD) that improves the exploration efficiency of Reinforcement Learning (RL) by providing expert demonstrations. Most of existing RLfD methods require demonstrations to be perfect and sufficient, which yet is unrealistic to meet in practice. To work on imperfect demonstrations, we first define an imperfect expert setting for RLfD in a formal way, and then point out that previous methods suffer from two issues in terms of optimality and convergence, respectively. Upon the theoretical findings we have derived, we tackle these two issues by regarding the expert guidance as a soft constraint on regulating the policy exploration of the agent, which eventually leads to a constrained optimization problem. We further demonstrate that such problem is able to be addressed efficiently by performing a local linear search on its dual form. Considerable empirical evaluations on a comprehensive collection of benchmarks indicate our method attains consistent improvement over other RLfD counterparts.
引用
收藏
页码:5109 / 5116
页数:8
相关论文
共 50 条
  • [1] Adaptive cooperative exploration for reinforcement learning from imperfect demonstrations *
    Huang, Fuxian
    Ji, Naye
    Ni, Huajian
    Li, Shijian
    Li, Xi
    [J]. PATTERN RECOGNITION LETTERS, 2023, 165 : 176 - 182
  • [2] On Pathologies in KL-Regularized Reinforcement Learning from Expert Demonstrations
    Rudner, Tim G. J.
    Lu, Cong
    Osborne, Michael A.
    Gal, Yarin
    Teh, Yee Whye
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [3] Forgetful experience replay in hierarchical reinforcement learning from expert demonstrations
    Skrynnik, Alexey
    Staroverov, Aleksey
    Aitygulov, Ermek
    Aksenov, Kirill
    Davydov, Vasilii
    Panov, Aleksandr, I
    [J]. KNOWLEDGE-BASED SYSTEMS, 2021, 218
  • [4] Reinforcement learning from expert demonstrations with application to redundant robot control
    Ramirez, Jorge
    Yu, Wen
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 119
  • [5] Model-free reinforcement learning from expert demonstrations: a survey
    Ramirez, Jorge
    Yu, Wen
    Perrusquia, Adolfo
    [J]. ARTIFICIAL INTELLIGENCE REVIEW, 2022, 55 (04) : 3213 - 3241
  • [6] Model-free reinforcement learning from expert demonstrations: a survey
    Jorge Ramírez
    Wen Yu
    Adolfo Perrusquía
    [J]. Artificial Intelligence Review, 2022, 55 : 3213 - 3241
  • [7] Shaping Rewards for Reinforcement Learning with Imperfect Demonstrations using Generative Models
    Wu, Yuchen
    Mozifian, Melissa
    Shkurti, Florian
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 6628 - 6634
  • [8] Embedding expert demonstrations into clustering buffer for effective deep reinforcement learning
    Wang, Shihmin
    Zhao, Binqi
    Zhang, Zhengfeng
    Zhang, Junping
    Pu, Jian
    [J]. FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2023, 24 (11) : 1541 - 1556
  • [9] Improved Deep Reinforcement Learning with Expert Demonstrations for Urban Autonomous Driving
    Liu, Haochen
    Huang, Zhiyu
    Wu, Jingda
    Lv, Chen
    [J]. 2022 IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV), 2022, : 921 - 928
  • [10] Learning to Weight Imperfect Demonstrations
    Wang, Yunke
    Xu, Chang
    Du, Bo
    Lee, Honglak
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139