Provably Efficient Imitation Learning from Observation Alone

被引：0

作者：

Sun, Wen ^{[1
]}

Vemula, Anirudh ^{[1
]}

Boots, Byron ^{[2
]}

Bagnell, J. Andrew ^{[3
]}

机构：

[1] Carnegie Mellon Univ, Robot Inst, Pittsburgh, PA 15213 USA

[2] Georgia Inst Technol, Coll Comp, Atlanta, GA 30332 USA

[3] Aurora Innovat, Pittsburgh, PA USA

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97 | 2019年 / 97卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We study Imitation Learning (IL) from Observations alone (ILFO) in large-scale MDPs. While most IL algorithms rely on an expert to directly provide actions to the learner, in this setting the expert only supplies sequences of observations. We design a new model-free algorithm for ILFO, Forward Adversarial Imitation Learning (FAIL), which learns a sequence of time-dependent policies by minimizing an Integral Probability Metric between the observation distributions of the expert policy and the learner. FAIL is the first provably efficient algorithm in ILFO setting, which learns a near-optimal policy with a number of samples that is polynomial in all relevant parameters but independent of the number of unique observations. The resulting theory extends the domain of provably sample efficient learning algorithms beyond existing results, which typically only consider tabular reinforcement learning settings or settings that require access to a near-optimal reset distribution. We also demonstrate the efficacy of FAIL on multiple OpenAI Gym control tasks.

引用

页数：10

共 50 条

[21] An imitation from observation approach for dozing distance learning in autonomous bulldozer operation
You, Ke
Ding, Lieyun
Dou, Quanli
Jiang, Yutian
Wu, Zhangang
Zhou, Cheng
ADVANCED ENGINEERING INFORMATICS, 2022, 54
[22] Restored Action Generative Adversarial Imitation Learning from observation for robot manipulator
Park, Jongcheon
Han, Seungyong
Lee, S. M.
ISA TRANSACTIONS, 2022, 129 : 684 - 690
[23] Learn by Observation: Imitation Learning for Drone Patrolling from Videos of A Human Navigator
Fan, Yue
Chu, Shilei
Zhang, Wei
Song, Ran
Li, Yibin
2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2020, : 5209 - 5216
[24] A Provably Efficient Sample Collection Strategy for Reinforcement Learning
Tarbouriech, Jean
Pirotta, Matteo
Valko, Michal
Lazaric, Alessandro
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[25] Elliptic PDE learning is provably data-efficient
Boulle, Nicolas
Halikias, Diana
Townsend, Alex
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2023, 120 (39)
[26] Provably Efficient Reinforcement Learning with Linear Function Approximation
Jin, Chi
Yang, Zhuoran
Wang, Zhaoran
Jordan, Michael, I
MATHEMATICS OF OPERATIONS RESEARCH, 2023, 48 (03) : 1496 - 1521
[27] Provably Efficient Reinforcement Learning via Surprise Bound
Zhu, Hanlin
Wang, Ruosong
Lee, Jason D.
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206
[28] Stabilizing Q-learning with Linear Architectures for Provably Efficient Learning
Zanette, Andrea
Wainwright, Martin J.
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[29] Special issue on robot learning by observation, demonstration, and imitation
Demiris, Ylannls
Billard, Aude
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2007, 37 (02): : 254 - 255
[30] Imitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation
Liu, YuXuan
Gupta, Abhishek
Abbeel, Pieter
Levine, Sergey
2018 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2018, : 1118 - 1125

← 1 2 3 4 5 →