Bridging the Gap Between Imitation Learning and Inverse Reinforcement Learning

被引:52
|
作者
Piot, Bilal [1 ]
Geist, Matthieu [2 ]
Pietquin, Olivier [3 ]
机构
[1] Univ Lille 1, Cent Lille, INRIA, CNRS,UMR CRIStAL 9189, F-59000 Lille, France
[2] Univ Paris Saclay, UMI 2958, Georgia Tech, CNRS,Cent Supelec, F-57070 Metz, France
[3] Univ Lille 1, Cent Lille, INRIA, CNRS,UMR CRIStAL 9189,IUF, F-59000 Lille, France
关键词
Imitation learning (IL); inverse reinforcement learning (IRL); learning from demonstrations (LfD);
D O I
10.1109/TNNLS.2016.2543000
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning from demonstrations is a paradigm by which an apprentice agent learns a control policy for a dynamic environment by observing demonstrations delivered by an expert agent. It is usually implemented as either imitation learning (IL) or inverse reinforcement learning (IRL) in the literature. On the one hand, IRL is a paradigm relying on the Markov decision processes, where the goal of the apprentice agent is to find a reward function from the expert demonstrations that could explain the expert behavior. On the other hand, IL consists in directly generalizing the expert strategy, observed in the demonstrations, to unvisited states (and it is therefore close to classification, when there is a finite set of possible decisions). While these two visions are often considered as opposite to each other, the purpose of this paper is to exhibit a formal link between these approaches from which new algorithms can be derived. We show that IL and IRL can be redefined in a way that they are equivalent, in the sense that there exists an explicit bijective operator (namely, the inverse optimal Bellman operator) between their respective spaces of solutions. To do so, we introduce the set-policy framework that creates a clear link between the IL and the IRL. As a result, the IL and IRL solutions making the best of both worlds are obtained. In addition, it is a unifying framework from which existing IL and IRL algorithms can be derived and which opens the way for the IL methods able to deal with the environment's dynamics. Finally, the IRL algorithms derived from the set-policy framework are compared with the algorithms belonging to the more common trajectory-matching family. Experiments demonstrate that the set-policy-based algorithms outperform both the standard IRL and IL ones and result in more robust solutions.
引用
收藏
页码:1814 / 1826
页数:13
相关论文
共 50 条
  • [31] Bridging the gap between debiasing and privacy for deep learning
    Barbano, Carlo Alberto
    Tartaglione, Enzo
    Grangetto, Marco
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 3799 - 3808
  • [32] Bridging the Gap between Human Knowledge and Machine Learning
    Alvarado-Perez, Juan C.
    Peluffo-Ordonez, Diego H.
    Theron, Roberto
    [J]. ADCAIJ-ADVANCES IN DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE JOURNAL, 2015, 4 (01): : 54 - 64
  • [33] Bridging the Gap between LMS and Social Network Learning Analytics in Online Learning
    Hernandez-Garcia, Angel
    Angel Conde, Miguel
    [J]. JOURNAL OF INFORMATION TECHNOLOGY RESEARCH, 2016, 9 (04) : 1 - 15
  • [34] Imitation Game: A Model-based and Imitation Learning Deep Reinforcement Learning Hybrid
    Veith, Eric Msp
    Logemann, Torben
    Berezin, Aleksandr
    Wellssow, Arlena
    Balduin, Stephan
    [J]. 2024 12TH WORKSHOP ON MODELING AND SIMULATION OF CYBER-PHYSICAL ENERGY SYSTEMS, MSCPES, 2024,
  • [35] Reinforcement learning building control approach harnessing imitation learning
    Dey, Sourav
    Marzullo, Thibault
    Zhang, Xiangyu
    Henze, Gregor
    [J]. ENERGY AND AI, 2023, 14
  • [36] Integration of Evolutionary Computing and Reinforcement Learning for Robotic Imitation Learning
    Tan, Huan
    Balajee, Kannan
    Lynn, DeRose
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), 2014, : 407 - 412
  • [37] An Empirical Comparison on Imitation Learning and Reinforcement Learning for Paraphrase Generation
    Du, Wanyu
    Ji, Yangfeng
    [J]. 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 6012 - 6018
  • [38] A Penetration Strategy Combining Deep Reinforcement Learning and Imitation Learning
    Wang, Xiaofang
    Gu, Kunren
    [J]. Yuhang Xuebao/Journal of Astronautics, 2023, 44 (06): : 914 - 925
  • [39] Learning How to Play Bomberman with Deep Reinforcement and Imitation Learning
    Goulart, Icaro
    Paes, Aline
    Clua, Esteban
    [J]. ENTERTAINMENT COMPUTING AND SERIOUS GAMES, ICEC-JCSG 2019, 2019, 11863 : 121 - 133
  • [40] Cloud Resource Scheduling With Deep Reinforcement Learning and Imitation Learning
    Guo, Wenxia
    Tian, Wenhong
    Ye, Yufei
    Xu, Lingxiao
    Wu, Kui
    [J]. IEEE INTERNET OF THINGS JOURNAL, 2021, 8 (05): : 3576 - 3586