Provably Efficient Imitation Learning from Observation Alone

被引:0
|
作者
Sun, Wen [1 ]
Vemula, Anirudh [1 ]
Boots, Byron [2 ]
Bagnell, J. Andrew [3 ]
机构
[1] Carnegie Mellon Univ, Robot Inst, Pittsburgh, PA 15213 USA
[2] Georgia Inst Technol, Coll Comp, Atlanta, GA 30332 USA
[3] Aurora Innovat, Pittsburgh, PA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study Imitation Learning (IL) from Observations alone (ILFO) in large-scale MDPs. While most IL algorithms rely on an expert to directly provide actions to the learner, in this setting the expert only supplies sequences of observations. We design a new model-free algorithm for ILFO, Forward Adversarial Imitation Learning (FAIL), which learns a sequence of time-dependent policies by minimizing an Integral Probability Metric between the observation distributions of the expert policy and the learner. FAIL is the first provably efficient algorithm in ILFO setting, which learns a near-optimal policy with a number of samples that is polynomial in all relevant parameters but independent of the number of unique observations. The resulting theory extends the domain of provably sample efficient learning algorithms beyond existing results, which typically only consider tabular reinforcement learning settings or settings that require access to a near-optimal reset distribution. We also demonstrate the efficacy of FAIL on multiple OpenAI Gym control tasks.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Provably Efficient Third-Person Imitation from Offline Observation
    Zweig, Aaron
    Bruna, Joan
    CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE (UAI 2020), 2020, 124 : 1228 - 1237
  • [2] Provably Efficient Adversarial Imitation Learning with Unknown Transitions
    Xu, Tian
    Li, Ziniu
    Yu, Yang
    Luo, Zhi-Quan
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2023, 216 : 2367 - 2378
  • [3] MobILE: Model-Based Imitation Learning From Observation Alone
    Kidambi, Rahul
    Chang, Jonathan D.
    Sun, Wen
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [4] Learning from Demonstration: Provably Efficient Adversarial Policy Imitation with Linear Function Approximation
    Liu, Zhihan
    Zhang, Yufeng
    Fu, Zuyue
    Yang, Zhuoran
    Wang, Zhaoran
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [5] Imitation Learning from Observation
    Torabi, Faraz
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 9900 - 9901
  • [6] Efficient Robot Skill Learning: Grounded Simulation Learning and Imitation Learning from Observation
    Stone, Peter
    2021 IEEE INTERNATIONAL CONFERENCE ON AUTONOMOUS ROBOT SYSTEMS AND COMPETITIONS (ICARSC), 2021, : 3 - 3
  • [7] DEALIO: Data-Efficient Adversarial Learning for Imitation from Observation
    Torabi, Faraz
    Warnell, Garrett
    Stone, Peter
    2021 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2021, : 2391 - 2397
  • [8] Recent Advances in Imitation Learning from Observation
    Torabi, Faraz
    Warnell, Garrett
    Stone, Peter
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 6325 - 6331
  • [9] Provably Efficient Learning of Transferable Rewards
    Metelli, Alberto Maria
    Ramponi, Giorgia
    Concetti, Alessandro
    Restelli, Marcello
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [10] Is Q-learning Provably Efficient?
    Jin, Chi
    Allen-Zhu, Zeyuan
    Bubeck, Sebastien
    Jordan, Michael I.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31