Accelerating Self-Imitation Learning from Demonstrations via Policy Constraints and Q-Ensemble

被引:0
|
作者
Li, Chao [1 ]
Wu, Fengge [1 ]
Zhao, Junsuo [1 ]
机构
[1] Univ Chinese Acad Sci, Chinese Acad Sci, Inst Software, Beijing, Peoples R China
关键词
Deep Reinforcement Learning; Learning from Demonstrations; Self-Imitation Learning; Sample Efficiency;
D O I
10.1109/IJCNN54540.2023.10191691
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep reinforcement learning (DRL) provides a new way to generate robot control policy. However, the process of training control policy requires lengthy exploration, resulting in a low sample efficiency of reinforcement learning (RL) in real-world tasks. Both imitation learning (IL) and learning from demonstrations (LfD) improve the training process by using expert demonstrations, but imperfect expert demonstrations can mislead policy improvement. Offline to Online reinforcement learning requires a lot of offline data to initialize the policy, and distribution shift can easily lead to performance degradation during online fine-tuning. To solve the above problems, we propose a learning from demonstrations method named Accelerating Self-Imitation Learning from Demonstrations (A-SILfD), which treats expert demonstrations as the agent's successful experiences and uses experiences to constrain policy improvement. Furthermore, we prevent performance degradation due to large estimation errors in the Q-function by the ensemble Q-functions. Our experiments show that A-SILfD can significantly improve sample efficiency using a small number of different quality expert demonstrations. In four Mujoco continuous control tasks, A-SILfD can significantly outperform baseline methods after 150,000 steps of online training and is not misled by imperfect expert demonstrations during training. In addition, our ablation experiments demonstrate the effectiveness of each part of the method.
引用
下载
收藏
页数:8
相关论文
共 14 条
  • [1] Learning Category-Level Generalizable Object Manipulation Policy Via Generative Adversarial Self-Imitation Learning From Demonstrations
    Shen, Hao
    Wan, Weikang
    Wang, He
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (04) : 11166 - 11173
  • [2] Self-Imitation Learning via Generalized Lower Bound Q-learning
    Tang, Yunhao
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [3] Learning Robotic Skills via Self-Imitation and Guide Reward
    Ran, Chenyang
    Su, Jianbo
    2021 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2021, : 2158 - 2163
  • [4] Automated Anomaly Detection via Curiosity-Guided Search and Self-Imitation Learning
    Li, Yuening
    Chen, Zhengzhang
    Zha, Daochen
    Zhou, Kaixiong
    Jin, Haifeng
    Chen, Haifeng
    Hu, Xia
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (06) : 2365 - 2377
  • [5] Harnessing Network Effect for Fake News Mitigation: Selecting Debunkers via Self-Imitation Learning
    Xu, Xiaofei
    Deng, Ke
    Dann, Michael
    Zhang, Xiuzhen
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 20, 2024, : 22447 - 22456
  • [6] Taking complementary advantages: Improving exploration via double self-imitation learning in procedurally-generated environments
    Lin, Hao
    He, Yue
    Li, Fanzhang
    Liu, Quan
    Wang, Bangjun
    Zhu, Fei
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 238
  • [7] Self-Practice Imitation Learning from Weak Policy
    Da, Qing
    Yu, Yang
    Zhou, Zhi-Hua
    PARTIALLY SUPERVISED LEARNING, PSL 2013, 2013, 8193 : 9 - 20
  • [8] Offline Reinforcement Learning via Policy Regularization and Ensemble Q-Functions
    Wang, Tao
    Xie, Shaorong
    Gao, Mingke
    Chen, Xue
    Zhang, Zhenyu
    Yu, Hang
    2022 IEEE 34TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2022, : 1167 - 1174
  • [9] Self-Adaptive Imitation Learning: Learning Tasks with Delayed Rewards from Sub-optimal Demonstrations
    Zhu, Zhuangdi
    Lin, Kaixiang
    Dai, Bo
    Zhou, Jiayu
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 9269 - 9277
  • [10] DROID: Learning from Offline Heterogeneous Demonstrations via Reward-Policy Distillation
    Jayanthi, Sravan
    Chen, Letian
    Balabanska, Nadya
    Duong, Van
    Scarlatescu, Erik
    Ameperosa, Ezra
    Zaidi, Zulfiqar
    Martin, Daniel
    Del Matto, Taylor
    Ono, Masahiro
    Gombolay, Matthew
    CONFERENCE ON ROBOT LEARNING, VOL 229, 2023, 229