Provably Efficient Imitation Learning from Observation Alone

被引:0
|
作者
Sun, Wen [1 ]
Vemula, Anirudh [1 ]
Boots, Byron [2 ]
Bagnell, J. Andrew [3 ]
机构
[1] Carnegie Mellon Univ, Robot Inst, Pittsburgh, PA 15213 USA
[2] Georgia Inst Technol, Coll Comp, Atlanta, GA 30332 USA
[3] Aurora Innovat, Pittsburgh, PA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study Imitation Learning (IL) from Observations alone (ILFO) in large-scale MDPs. While most IL algorithms rely on an expert to directly provide actions to the learner, in this setting the expert only supplies sequences of observations. We design a new model-free algorithm for ILFO, Forward Adversarial Imitation Learning (FAIL), which learns a sequence of time-dependent policies by minimizing an Integral Probability Metric between the observation distributions of the expert policy and the learner. FAIL is the first provably efficient algorithm in ILFO setting, which learns a near-optimal policy with a number of samples that is polynomial in all relevant parameters but independent of the number of unique observations. The resulting theory extends the domain of provably sample efficient learning algorithms beyond existing results, which typically only consider tabular reinforcement learning settings or settings that require access to a near-optimal reset distribution. We also demonstrate the efficacy of FAIL on multiple OpenAI Gym control tasks.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Pessimistic Minimax Value Iteration: Provably Efficient Equilibrium Learning from Offline Datasets
    Zhong, Han
    Xiong, Wei
    Tan, Jiyuan
    Wang, Liwei
    Zhang, Tong
    Wang, Zhaoran
    Yang, Zhuoran
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [32] Model-based Imitation Learning from Observation for input estimation in monitored systems
    Liu, Wei
    Lai, Zhilu
    Stoura, Charikleia D.
    Bacsa, Kiran
    Chatzi, Eleni
    MECHANICAL SYSTEMS AND SIGNAL PROCESSING, 2025, 225
  • [33] Sample-efficient Adversarial Imitation Learning
    Jung, Dahuin
    Lee, Hyungyu
    Yoon, Sungroh
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
  • [34] NAO Robot Learns to Interact with Humans through Imitation Learning from Video Observation
    Kolagar, Seyed Adel Alizadeh
    Taheri, Alireza
    Meghdari, Ali. F. F.
    JOURNAL OF INTELLIGENT & ROBOTIC SYSTEMS, 2023, 109 (01)
  • [35] NAO Robot Learns to Interact with Humans through Imitation Learning from Video Observation
    Seyed Adel Alizadeh Kolagar
    Alireza Taheri
    Ali F. Meghdari
    Journal of Intelligent & Robotic Systems, 2023, 109
  • [36] On Efficient Online Imitation Learning via Classification
    Li, Yichen
    Zhang, Chicheng
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [37] Sample-efficient Adversarial Imitation Learning
    Jung, Dahuin
    Lee, Hyungyu
    Yoon, Sungroh
    Journal of Machine Learning Research, 2024, 25 : 1 - 32
  • [38] Efficient Imitation Learning with Conservative World Models
    Kolev, Victor
    Rafailov, Rafael
    Hatch, Kyle
    Wu, Jiajun
    Finn, Chelsea
    6TH ANNUAL LEARNING FOR DYNAMICS & CONTROL CONFERENCE, 2024, 242 : 1776 - 1789
  • [39] Sample-efficient Adversarial Imitation Learning
    Jung, Dahuin
    Lee, Hyungyu
    Yoon, Sungroh
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25 : 1 - 32
  • [40] Provably Feedback-Efficient Reinforcement Learning via Active Reward Learning
    Kong, Dingwen
    Yang, Lin F.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,