Unlabeled Imperfect Demonstrations in Adversarial Imitation Learning

被引:0
|
作者
Wang, Yunke [1 ,2 ]
Du, Bo [1 ,2 ]
Xu, Chang [3 ]
机构
[1] Wuhan Univ, Sch Comp Sci, Natl Engn Res Ctr Multimedia Software, Inst Artificial Intelligence, Wuhan, Peoples R China
[2] Wuhan Univ, Hubei Key Lab Multimedia & Network Commun Engn, Wuhan, Peoples R China
[3] Univ Sydney, Sch Comp Sci, Fac Engn, Sydney, Australia
基金
澳大利亚研究理事会; 中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Adversarial imitation learning has become a widely used imitation learning framework. The discriminator is often trained by taking expert demonstrations and policy trajectories as examples respectively from two categories (positive vs. negative) and the policy is then expected to produce trajectories that are indistinguishable from the expert demonstrations. But in the real world, the collected expert demonstrations are more likely to be imperfect, where only an unknown fraction of the demonstrations are optimal. Instead of treating imperfect expert demonstrations as absolutely positive or negative, we investigate unlabeled imperfect expert demonstrations as they are. A positive-unlabeled adversarial imitation learning algorithm is developed to dynamically sample expert demonstrations that can well match the trajectories from the constantly optimized agent policy. The trajectories of an initial agent policy could be closer to those non-optimal expert demonstrations, but within the framework of adversarial imitation learning, agent policy will be optimized to cheat the discriminator and produce trajectories that are similar to those optimal expert demonstrations. Theoretical analysis shows that our method learns from the imperfect demonstrations via a self-paced way. Experimental results on MuJoCo and RoboSuite platforms demonstrate the effectiveness of our method from different aspects.
引用
收藏
页码:10262 / 10270
页数:9
相关论文
共 50 条
  • [1] Efficient Off-policy Adversarial Imitation Learning with Imperfect Demonstrations
    Li, Jiangeng
    Zhao, Qishen
    Huang, Shuai
    Zuo, Guoyu
    [J]. PROCEEDINGS OF THE 33RD CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2021), 2021, : 1692 - 1697
  • [2] Best-in-class imitation: Non-negative positive-unlabeled imitation learning from imperfect demonstrations
    Zhang, Lin
    Zhu, Fei
    Ling, Xinghong
    Liu, Quan
    [J]. INFORMATION SCIENCES, 2022, 601 : 71 - 89
  • [3] Programmatic Imitation Learning From Unlabeled and Noisy Demonstrations
    Xin, Jimmy
    Zheng, Linus
    Rahmani, Kia
    Wei, Jiayi
    Holtz, Jarrett
    Dillig, Isil
    Biswas, Joydeep
    [J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (06): : 4894 - 4901
  • [4] Adversarial Imitation Learning from Incomplete Demonstrations
    Sun, Mingfei
    Xiaojuan
    [J]. PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 3513 - 3519
  • [5] Adversarial Imitation Learning from State-only Demonstrations
    Torabi, Faraz
    Warnell, Garrett
    Stone, Peter
    [J]. AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 2229 - 2231
  • [6] Adversarial imitation learning with mixed demonstrations from multiple demonstrators
    Zuo, Guoyu
    Zhao, Qishen
    Huang, Shuai
    Li, Jiangeng
    Gong, Daoxiong
    [J]. NEUROCOMPUTING, 2021, 457 (457) : 365 - 376
  • [7] Learning from Imperfect Demonstrations via Adversarial Confidence Transfer
    Cao, Zhangjie
    Wang, Zihan
    Sadigh, Dorsa
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2022), 2022,
  • [8] Robust Adversarial Imitation Learning via Adaptively-Selected Demonstrations
    Wang, Yunke
    Xu, Chang
    Du, Bo
    [J]. PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 3155 - 3161
  • [9] Imitation learning from imperfect demonstrations for AUV path tracking and obstacle avoidance
    Chen, Tianhao
    Zhang, Zheng
    Fang, Zheng
    Jiang, Dong
    Li, Guangliang
    [J]. OCEAN ENGINEERING, 2024, 298
  • [10] Model-based Adversarial Imitation Learning from Demonstrations and Human Reward
    Huang, Jie
    Hao, Jiangshan
    Juan, Rongshun
    Gomez, Randy
    Nakamura, Keisuke
    Li, Guangliang
    [J]. 2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, IROS, 2023, : 1683 - 1690