Provably Efficient Imitation Learning from Observation Alone

被引:0
|
作者
Sun, Wen [1 ]
Vemula, Anirudh [1 ]
Boots, Byron [2 ]
Bagnell, J. Andrew [3 ]
机构
[1] Carnegie Mellon Univ, Robot Inst, Pittsburgh, PA 15213 USA
[2] Georgia Inst Technol, Coll Comp, Atlanta, GA 30332 USA
[3] Aurora Innovat, Pittsburgh, PA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study Imitation Learning (IL) from Observations alone (ILFO) in large-scale MDPs. While most IL algorithms rely on an expert to directly provide actions to the learner, in this setting the expert only supplies sequences of observations. We design a new model-free algorithm for ILFO, Forward Adversarial Imitation Learning (FAIL), which learns a sequence of time-dependent policies by minimizing an Integral Probability Metric between the observation distributions of the expert policy and the learner. FAIL is the first provably efficient algorithm in ILFO setting, which learns a near-optimal policy with a number of samples that is polynomial in all relevant parameters but independent of the number of unique observations. The resulting theory extends the domain of provably sample efficient learning algorithms beyond existing results, which typically only consider tabular reinforcement learning settings or settings that require access to a near-optimal reset distribution. We also demonstrate the efficacy of FAIL on multiple OpenAI Gym control tasks.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] An imitation from observation approach for dozing distance learning in autonomous bulldozer operation
    You, Ke
    Ding, Lieyun
    Dou, Quanli
    Jiang, Yutian
    Wu, Zhangang
    Zhou, Cheng
    ADVANCED ENGINEERING INFORMATICS, 2022, 54
  • [22] Restored Action Generative Adversarial Imitation Learning from observation for robot manipulator
    Park, Jongcheon
    Han, Seungyong
    Lee, S. M.
    ISA TRANSACTIONS, 2022, 129 : 684 - 690
  • [23] Learn by Observation: Imitation Learning for Drone Patrolling from Videos of A Human Navigator
    Fan, Yue
    Chu, Shilei
    Zhang, Wei
    Song, Ran
    Li, Yibin
    2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2020, : 5209 - 5216
  • [24] A Provably Efficient Sample Collection Strategy for Reinforcement Learning
    Tarbouriech, Jean
    Pirotta, Matteo
    Valko, Michal
    Lazaric, Alessandro
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [25] Elliptic PDE learning is provably data-efficient
    Boulle, Nicolas
    Halikias, Diana
    Townsend, Alex
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2023, 120 (39)
  • [26] Provably Efficient Reinforcement Learning with Linear Function Approximation
    Jin, Chi
    Yang, Zhuoran
    Wang, Zhaoran
    Jordan, Michael, I
    MATHEMATICS OF OPERATIONS RESEARCH, 2023, 48 (03) : 1496 - 1521
  • [27] Provably Efficient Reinforcement Learning via Surprise Bound
    Zhu, Hanlin
    Wang, Ruosong
    Lee, Jason D.
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206
  • [28] Stabilizing Q-learning with Linear Architectures for Provably Efficient Learning
    Zanette, Andrea
    Wainwright, Martin J.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [29] Special issue on robot learning by observation, demonstration, and imitation
    Demiris, Ylannls
    Billard, Aude
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2007, 37 (02): : 254 - 255
  • [30] Imitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation
    Liu, YuXuan
    Gupta, Abhishek
    Abbeel, Pieter
    Levine, Sergey
    2018 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2018, : 1118 - 1125