Model-Based Imitation Learning Using Entropy Regularization of Model and Policy

被引:1
|
作者
Uchibe, Eiji [1 ]
机构
[1] ATR Computat Neurosci Labs, Dept Brain Robot Interface, Kyoto 6190288, Japan
关键词
Imitation learning; machine learning for robot control; reinforcement learning;
D O I
10.1109/LRA.2022.3196139
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Approaches based on generative adversarial networks for imitation learning are promising because they are sample efficient in terms of expert demonstrations. However, training a generator requires many interactions with the actual environment because model-free reinforcement learning is adopted to update a policy. To improve the sample efficiency using model-based reinforcement learning, we propose model-based Entropy-Regularized Imitation Learning (MB-ERIL) under the entropy-regularized Markov decision process to reduce the number of interactions with the actual environment. MB-ERIL uses two discriminators. A policy discriminator distinguishes the actions generated by a robot from expert ones, and a model discriminator distinguishes the counterfactual state transitions generated by the model from the actual ones. We derive structured discriminators so that the learning of the policy and the model is efficient. Computer simulations and real robot experiments show that MB-ERIL achieves a competitive performance and significantly improves the sample efficiency compared to baseline methods.
引用
收藏
页码:10922 / 10929
页数:8
相关论文
共 50 条
  • [1] Probabilistic model-based imitation learning
    Englert, Peter
    Paraschos, Alexandros
    Deisenroth, Marc Peter
    Peters, Jan
    [J]. ADAPTIVE BEHAVIOR, 2013, 21 (05) : 388 - 403
  • [2] Model-Based Imitation Learning for Urban Driving
    Hu, Anthony
    Corrado, Gianluca
    Griffiths, Nicolas
    Murez, Zak
    Gurau, Corina
    Yeo, Hudson
    Kendall, Alex
    Cipolla, Roberto
    Shotton, Jamie
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [3] A Probabilistic Framework for Model-Based Imitation Learning
    Shon, Aaron P.
    Grimes, David B.
    Baker, Chris L.
    Rao, Rajesh P. N.
    [J]. PROCEEDINGS OF THE TWENTY-SIXTH ANNUAL CONFERENCE OF THE COGNITIVE SCIENCE SOCIETY, 2004, : 1237 - 1242
  • [4] Imitation Game: A Model-based and Imitation Learning Deep Reinforcement Learning Hybrid
    Veith, Eric Msp
    Logemann, Torben
    Berezin, Aleksandr
    Wellssow, Arlena
    Balduin, Stephan
    [J]. 2024 12TH WORKSHOP ON MODELING AND SIMULATION OF CYBER-PHYSICAL ENERGY SYSTEMS, MSCPES, 2024,
  • [5] Model-based Imitation Learning by Probabilistic Trajectory Matching
    Englert, Peter
    Paraschos, Alexandros
    Peters, Jan
    Deisenroth, Marc Peter
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2013, : 1922 - 1927
  • [6] No Need for Interactions: Robust Model-Based Imitation Learning using Neural ODE
    Lin, HaoChih
    Li, Baopu
    Zhou, Xin
    Wang, Jiankun
    Meng, Max Q-H
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 11088 - 11094
  • [7] Model gradient: unified model and policy learning in model-based reinforcement learning
    Chengxing Jia
    Fuxiang Zhang
    Tian Xu
    Jing-Cheng Pang
    Zongzhang Zhang
    Yang Yu
    [J]. Frontiers of Computer Science, 2024, 18
  • [8] Model gradient: unified model and policy learning in model-based reinforcement learning
    Jia, Chengxing
    Zhang, Fuxiang
    Xu, Tian
    Pang, Jing-Cheng
    Zhang, Zongzhang
    Yu, Yang
    [J]. FRONTIERS OF COMPUTER SCIENCE, 2024, 18 (04)
  • [9] Model-Based Offline Policy Optimization with Distribution Correcting Regularization
    Shen, Jian
    Chen, Mingcheng
    Zhang, Zhicheng
    Yang, Zhengyu
    Zhang, Weinan
    Yu, Yong
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, 2021, 12975 : 174 - 189
  • [10] MobILE: Model-Based Imitation Learning From Observation Alone
    Kidambi, Rahul
    Chang, Jonathan D.
    Sun, Wen
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34