Model-Based Imitation Learning Using Entropy Regularization of Model and Policy

被引:1
|
作者
Uchibe, Eiji [1 ]
机构
[1] ATR Computat Neurosci Labs, Dept Brain Robot Interface, Kyoto 6190288, Japan
关键词
Imitation learning; machine learning for robot control; reinforcement learning;
D O I
10.1109/LRA.2022.3196139
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Approaches based on generative adversarial networks for imitation learning are promising because they are sample efficient in terms of expert demonstrations. However, training a generator requires many interactions with the actual environment because model-free reinforcement learning is adopted to update a policy. To improve the sample efficiency using model-based reinforcement learning, we propose model-based Entropy-Regularized Imitation Learning (MB-ERIL) under the entropy-regularized Markov decision process to reduce the number of interactions with the actual environment. MB-ERIL uses two discriminators. A policy discriminator distinguishes the actions generated by a robot from expert ones, and a model discriminator distinguishes the counterfactual state transitions generated by the model from the actual ones. We derive structured discriminators so that the learning of the policy and the model is efficient. Computer simulations and real robot experiments show that MB-ERIL achieves a competitive performance and significantly improves the sample efficiency compared to baseline methods.
引用
收藏
页码:10922 / 10929
页数:8
相关论文
共 50 条
  • [11] Hierarchical Model-Based Imitation Learning for Planning in Autonomous Driving
    Bronstein, Eli
    Palatucci, Mark
    Notz, Dominik
    White, Brandyn
    Kuefler, Alex
    Lu, Yiren
    Paul, Supratik
    Nikdel, Payam
    Mougin, Paul
    Chen, Hongge
    Fu, Justin
    Abrams, Austin
    Shah, Punit
    Racah, Evan
    Frenkel, Benjamin
    Whiteson, Shimon
    Anguelov, Dragomir
    [J]. 2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2022, : 8652 - 8659
  • [12] Offline Model-based Adaptable Policy Learning
    Chen, Xiong-Hui
    Yu, Yang
    Li, Qingyang
    Luo, Fan-Ming
    Qin, Zhiwei
    Shang, Wenjie
    Ye, Jieping
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [13] Model-based Adversarial Imitation Learning from Demonstrations and Human Reward
    Huang, Jie
    Hao, Jiangshan
    Juan, Rongshun
    Gomez, Randy
    Nakamura, Keisuke
    Li, Guangliang
    [J]. 2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, IROS, 2023, : 1683 - 1690
  • [14] GeoGail: A Model-Based Imitation Learning Framework for Human Trajectory Synthesizing
    Wu, Yuchen
    Wang, Huandong
    Gao, Changzheng
    Jin, Depeng
    Li, Yong
    [J]. ACM Transactions on Knowledge Discovery from Data, 2024, 19 (01)
  • [15] Regularization and optimization in model-based clustering
    Sampaio, Raphael Araujo
    Garcia, Joaquim Dias
    Poggi, Marcus
    Vidal, Thibaut
    [J]. PATTERN RECOGNITION, 2024, 150
  • [16] Type-2 Fuzzy Model-Based Movement Primitives for Imitation Learning
    Sun, Da
    Liao, Qianfang
    Loutfi, Amy
    [J]. IEEE TRANSACTIONS ON ROBOTICS, 2022, 38 (04) : 2462 - 2480
  • [17] Model-based Imitation Learning for Real-time Robot Navigation in Crowds
    Moder, Martin
    Oezgan, Fatih
    Pauli, Josef
    [J]. 2023 32ND IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, RO-MAN, 2023, : 513 - 519
  • [18] Combining Model-Based Controllers and Generative Adversarial Imitation Learning for Traffic Simulation
    Chen, Haonan
    Ji, Tianchen
    Liu, Shuijing
    Driggs-Campbell, Katherine
    [J]. 2022 IEEE 25TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC), 2022, : 1698 - 1704
  • [19] Model Learning and Model-Based Testing
    Aichernig, Bernhard K.
    Mostowski, Wojciech
    Mousavi, Mohammad Reza
    Tappler, Martin
    Taromirad, Masoumeh
    [J]. MACHINE LEARNING FOR DYNAMIC SOFTWARE ANALYSIS: POTENTIALS AND LIMITS, 2018, 11026 : 74 - 100
  • [20] Model-based capacitated clustering with posterior regularization
    Mai, Feng
    Fry, Michael J.
    Ohlmann, Jeffrey W.
    [J]. EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2018, 271 (02) : 594 - 605