Bridging the Gap Between Imitation Learning and Inverse Reinforcement Learning

被引：52

作者：

Piot, Bilal ^{[1
]}

Geist, Matthieu ^{[2
]}

Pietquin, Olivier ^{[3
]}

机构：

[1] Univ Lille 1, Cent Lille, INRIA, CNRS,UMR CRIStAL 9189, F-59000 Lille, France

[2] Univ Paris Saclay, UMI 2958, Georgia Tech, CNRS,Cent Supelec, F-57070 Metz, France

[3] Univ Lille 1, Cent Lille, INRIA, CNRS,UMR CRIStAL 9189,IUF, F-59000 Lille, France

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2017年 / 28卷 / 08期

关键词：

Imitation learning (IL); inverse reinforcement learning (IRL); learning from demonstrations (LfD);

D O I：

10.1109/TNNLS.2016.2543000

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Learning from demonstrations is a paradigm by which an apprentice agent learns a control policy for a dynamic environment by observing demonstrations delivered by an expert agent. It is usually implemented as either imitation learning (IL) or inverse reinforcement learning (IRL) in the literature. On the one hand, IRL is a paradigm relying on the Markov decision processes, where the goal of the apprentice agent is to find a reward function from the expert demonstrations that could explain the expert behavior. On the other hand, IL consists in directly generalizing the expert strategy, observed in the demonstrations, to unvisited states (and it is therefore close to classification, when there is a finite set of possible decisions). While these two visions are often considered as opposite to each other, the purpose of this paper is to exhibit a formal link between these approaches from which new algorithms can be derived. We show that IL and IRL can be redefined in a way that they are equivalent, in the sense that there exists an explicit bijective operator (namely, the inverse optimal Bellman operator) between their respective spaces of solutions. To do so, we introduce the set-policy framework that creates a clear link between the IL and the IRL. As a result, the IL and IRL solutions making the best of both worlds are obtained. In addition, it is a unifying framework from which existing IL and IRL algorithms can be derived and which opens the way for the IL methods able to deal with the environment's dynamics. Finally, the IRL algorithms derived from the set-policy framework are compared with the algorithms belonging to the more common trajectory-matching family. Experiments demonstrate that the set-policy-based algorithms outperform both the standard IRL and IL ones and result in more robust solutions.

引用

页码：1814 / 1826

页数：13

共 50 条

[1] Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism
Rashidinejad, Paria
Zhu, Banghua
Ma, Cong
Jiao, Jiantao
Russell, Stuart
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[2] Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism
Rashidinejad, Paria
Zhu, Banghua
Ma, Cong
Jiao, Jiantao
Russell, Stuart
[J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2022, 68 (12) : 8156 - 8196
[3] Dialogue Generation: From Imitation Learning to Inverse Reinforcement Learning
Li, Ziming
Kiseleva, Julia
de Rijke, Maarten
[J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 6722 - 6729
[4] Methodologies for Imitation Learning via Inverse Reinforcement Learning: A Review
Zhang, Kaifeng
Yu, Yang
[J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2019, 56 (02): : 254 - 261
[5] Bridging the Gap Between Value and Policy Based Reinforcement Learning
Nachum, Ofir
Norouzi, Mohammad
Xu, Kelvin
Schuurmans, Dale
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
[6] Approximate Inverse Reinforcement Learning from Vision-based Imitation Learning
Lee, Keuntaek
Vlahov, Bogdan
Gibson, Jason
Rehg, James M.
Theodorou, Evangelos A.
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 10793 - 10799
[7] Tracking the Race Between Deep Reinforcement Learning and Imitation Learning
Gros, Timo P.
Hoeller, Daniel
Hoffmann, Joerg
Wolf, Verena
[J]. QUANTITATIVE EVALUATION OF SYSTEMS (QEST 2020), 2020, 12289 : 11 - 17
[8] Robust Imitation via Mirror Descent Inverse Reinforcement Learning
Han, Dong-Sig
Kim, Hyunseo
Lee, Hyundo
Ryu, Je-Hwan
Zhang, Byoung-Tak
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[9] Modeling the Development of Infant Imitation using Inverse Reinforcement Learning
Tekden, Ahmet E.
Ugur, Emre
Nagai, Yukie
Oztop, Erhan
[J]. 2018 JOINT IEEE 8TH INTERNATIONAL CONFERENCE ON DEVELOPMENT AND LEARNING AND EPIGENETIC ROBOTICS (ICDL-EPIROB), 2018, : 155 - 160
[10] Imitation and reinforcement learning
Kober, Jens
Peters, Jan
[J]. IEEE Robotics and Automation Magazine, 2010, 17 (02): : 55 - 62

← 1 2 3 4 5 →