Bridging the Gap Between Imitation Learning and Inverse Reinforcement Learning

被引:52
|
作者
Piot, Bilal [1 ]
Geist, Matthieu [2 ]
Pietquin, Olivier [3 ]
机构
[1] Univ Lille 1, Cent Lille, INRIA, CNRS,UMR CRIStAL 9189, F-59000 Lille, France
[2] Univ Paris Saclay, UMI 2958, Georgia Tech, CNRS,Cent Supelec, F-57070 Metz, France
[3] Univ Lille 1, Cent Lille, INRIA, CNRS,UMR CRIStAL 9189,IUF, F-59000 Lille, France
关键词
Imitation learning (IL); inverse reinforcement learning (IRL); learning from demonstrations (LfD);
D O I
10.1109/TNNLS.2016.2543000
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning from demonstrations is a paradigm by which an apprentice agent learns a control policy for a dynamic environment by observing demonstrations delivered by an expert agent. It is usually implemented as either imitation learning (IL) or inverse reinforcement learning (IRL) in the literature. On the one hand, IRL is a paradigm relying on the Markov decision processes, where the goal of the apprentice agent is to find a reward function from the expert demonstrations that could explain the expert behavior. On the other hand, IL consists in directly generalizing the expert strategy, observed in the demonstrations, to unvisited states (and it is therefore close to classification, when there is a finite set of possible decisions). While these two visions are often considered as opposite to each other, the purpose of this paper is to exhibit a formal link between these approaches from which new algorithms can be derived. We show that IL and IRL can be redefined in a way that they are equivalent, in the sense that there exists an explicit bijective operator (namely, the inverse optimal Bellman operator) between their respective spaces of solutions. To do so, we introduce the set-policy framework that creates a clear link between the IL and the IRL. As a result, the IL and IRL solutions making the best of both worlds are obtained. In addition, it is a unifying framework from which existing IL and IRL algorithms can be derived and which opens the way for the IL methods able to deal with the environment's dynamics. Finally, the IRL algorithms derived from the set-policy framework are compared with the algorithms belonging to the more common trajectory-matching family. Experiments demonstrate that the set-policy-based algorithms outperform both the standard IRL and IL ones and result in more robust solutions.
引用
收藏
页码:1814 / 1826
页数:13
相关论文
共 50 条
  • [41] Further learning: Bridging the gap
    Blackwell, Joel
    [J]. Structural Engineer, 2009, 87 (07): : 16 - 17
  • [42] Learning Behavior Styles with Inverse Reinforcement Learning
    Lee, Seong Jae
    popovic, Zoran
    [J]. ACM TRANSACTIONS ON GRAPHICS, 2010, 29 (04):
  • [43] Bridging the simulation-to-real gap of depth images for deep reinforcement learning
    Jang, Yoonsu
    Baek, Jongchan
    Jeon, Soo
    Han, Soohee
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 253
  • [44] Game-Based Learning in Higher Education and Lifelong Learning: Bridging the Gap between Theory and PracticeGame-Based Learning in Higher Education and Lifelong Learning: Bridging the Gap between Theory and Practice
    Starcic, Andreja Istenic
    [J]. COMPUTING AND COMPUTATIONAL TECHNIQUES IN SCIENCES, 2008, : 23 - +
  • [45] Repeated Inverse Reinforcement Learning
    Amin, Kareem
    Jiang, Nan
    Singh, Satinder
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [46] Bayesian Inverse Reinforcement Learning
    Ramachandran, Deepak
    Amir, Eyal
    [J]. 20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 2586 - 2591
  • [47] Inverse Constrained Reinforcement Learning
    Malik, Shehryar
    Anwar, Usman
    Aghasi, Alireza
    Ahmed, Ali
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [48] Cooperative Inverse Reinforcement Learning
    Hadfield-Menell, Dylan
    Dragan, Anca
    Abbeel, Pieter
    Russell, Stuart
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [49] Misspecification in Inverse Reinforcement Learning
    Skalse, Joar
    Abate, Alessandro
    [J]. THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 12, 2023, : 15136 - 15143
  • [50] Lifelong Inverse Reinforcement Learning
    Mendez, Jorge A.
    Shivkumar, Shashank
    Eaton, Eric
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31