Adversarial Imitation Learning from Incomplete Demonstrations

被引：0

作者：

Sun, Mingfei ^{[1
]}

Xiaojuan ^{[1
]}

机构：

[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Hong Kong, Peoples R China

来源：

PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE | 2019年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Imitation learning targets deriving a mapping from states to actions, a.k.a. policy, from expert demonstrations. Existing methods for imitation learning typically require any actions in the demonstrations to be fully available, which is hard to ensure in real applications. Though algorithms for learning with unobservable actions have been proposed, they focus solely on state information and overlook the fact that the action sequence could still be partially available and provide useful information for policy deriving. In this paper, we propose a novel algorithm called Action-Guided Adversarial Imitation Learning (AGAIL) that learns a policy from demonstrations with incomplete action sequences, i.e., incomplete demonstrations. The core idea of AGAIL is to separate demonstrations into state and action trajectories, and train a policy with state trajectories while using actions as auxiliary information to guide the training whenever applicable. Built upon the Generative Adversarial Imitation Learning, AGAIL has three components: a generator, a discriminator, and a guide. The generator learns a policy with rewards provided by the discriminator, which tries to distinguish state distributions between demonstrations and samples generated by the policy. The guide provides additional rewards to the generator when demonstrated actions for specific states are available. We compare AGAIL to other methods on benchmark tasks and show that AGAIL consistently delivers comparable performance to the state-of-the-art methods even when the action sequence in demonstrations is only partially available.

引用

页码：3513 / 3519

页数：7

共 50 条

[21] Learning from Imperfect Demonstrations via Adversarial Confidence Transfer
Cao, Zhangjie
Wang, Zihan
Sadigh, Dorsa
2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2022), 2022,
[22] Discriminator-Weighted Offline Imitation Learning from Suboptimal Demonstrations
Xu, Haoran
Zhan, Xianyuan
Yin, Honglei
Qin, Huiling
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[23] Confidence-Aware Imitation Learning from Demonstrations with Varying Optimality
Zhang, Songyuan
Cao, Zhangjie
Sadigh, Dorsa
Sui, Yanan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[24] BAGAIL: Multi-modal imitation learning from imbalanced demonstrations
Gu, Sijia
Zhu, Fei
NEURAL NETWORKS, 2024, 174
[25] What Matters for Adversarial Imitation Learning?
Orsini, Manu
Raichuk, Anton
Hussenot, Leonard
Vincent, Damien
Dadashi, Robert
Girgin, Sertan
Geist, Matthieu
Bachem, Olivier
Pietquin, Olivier
Andrychowicz, Marcin
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
[26] Quantum generative adversarial imitation learning
Xiao, Tailong
Huang, Jingzheng
Li, Hongjing
Fan, Jianping
Zeng, Guihua
NEW JOURNAL OF PHYSICS, 2023, 25 (03):
[27] DiffAIL: Diffusion Adversarial Imitation Learning
Wang, Bingzheng
Wu, Guoqiang
Pang, Teng
Zhang, Yan
Yin, Yilong
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 14, 2024, : 15447 - 15455
[28] Generative Adversarial Network for Imitation Learning from Single Demonstration
Tho Nguyen Duc
Chanh Minh Tran
Phan Xuan Tan
Kamioka, Eiji
BAGHDAD SCIENCE JOURNAL, 2021, 18 (04) : 1350 - 1355
[29] Deterministic generative adversarial imitation learning
Zuo, Guoyu
Chen, Kexin
Lu, Jiahao
Huang, Xiangsheng
NEUROCOMPUTING, 2020, 388 : 60 - 69
[30] Adversarial Imitation Learning from Video using a State Observer
Karnan, Haresh
Torabi, Faraz
Warnell, Garrett
Stone, Peter
2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2022), 2022, : 2452 - 2458

← 1 2 3 4 5 →