Provably Efficient Imitation Learning from Observation Alone

被引：0

作者：

Sun, Wen ^{[1
]}

Vemula, Anirudh ^{[1
]}

Boots, Byron ^{[2
]}

Bagnell, J. Andrew ^{[3
]}

机构：

[1] Carnegie Mellon Univ, Robot Inst, Pittsburgh, PA 15213 USA

[2] Georgia Inst Technol, Coll Comp, Atlanta, GA 30332 USA

[3] Aurora Innovat, Pittsburgh, PA USA

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97 | 2019年 / 97卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We study Imitation Learning (IL) from Observations alone (ILFO) in large-scale MDPs. While most IL algorithms rely on an expert to directly provide actions to the learner, in this setting the expert only supplies sequences of observations. We design a new model-free algorithm for ILFO, Forward Adversarial Imitation Learning (FAIL), which learns a sequence of time-dependent policies by minimizing an Integral Probability Metric between the observation distributions of the expert policy and the learner. FAIL is the first provably efficient algorithm in ILFO setting, which learns a near-optimal policy with a number of samples that is polynomial in all relevant parameters but independent of the number of unique observations. The resulting theory extends the domain of provably sample efficient learning algorithms beyond existing results, which typically only consider tabular reinforcement learning settings or settings that require access to a near-optimal reset distribution. We also demonstrate the efficacy of FAIL on multiple OpenAI Gym control tasks.

引用

页数：10

共 50 条

[1] Provably Efficient Third-Person Imitation from Offline Observation
Zweig, Aaron
Bruna, Joan
CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE (UAI 2020), 2020, 124 : 1228 - 1237
[2] Provably Efficient Adversarial Imitation Learning with Unknown Transitions
Xu, Tian
Li, Ziniu
Yu, Yang
Luo, Zhi-Quan
UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2023, 216 : 2367 - 2378
[3] MobILE: Model-Based Imitation Learning From Observation Alone
Kidambi, Rahul
Chang, Jonathan D.
Sun, Wen
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[4] Learning from Demonstration: Provably Efficient Adversarial Policy Imitation with Linear Function Approximation
Liu, Zhihan
Zhang, Yufeng
Fu, Zuyue
Yang, Zhuoran
Wang, Zhaoran
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[5] Imitation Learning from Observation
Torabi, Faraz
THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 9900 - 9901
[6] Efficient Robot Skill Learning: Grounded Simulation Learning and Imitation Learning from Observation
Stone, Peter
2021 IEEE INTERNATIONAL CONFERENCE ON AUTONOMOUS ROBOT SYSTEMS AND COMPETITIONS (ICARSC), 2021, : 3 - 3
[7] DEALIO: Data-Efficient Adversarial Learning for Imitation from Observation
Torabi, Faraz
Warnell, Garrett
Stone, Peter
2021 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2021, : 2391 - 2397
[8] Recent Advances in Imitation Learning from Observation
Torabi, Faraz
Warnell, Garrett
Stone, Peter
PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 6325 - 6331
[9] Provably Efficient Learning of Transferable Rewards
Metelli, Alberto Maria
Ramponi, Giorgia
Concetti, Alessandro
Restelli, Marcello
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[10] Is Q-learning Provably Efficient?
Jin, Chi
Allen-Zhu, Zeyuan
Bubeck, Sebastien
Jordan, Michael I.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31

← 1 2 3 4 5 →