Learning from Suboptimal Demonstration via Trajectory-Ranked Adversarial Imitation

被引:0
|
作者
Chen, Luyao [1 ]
Xie, Shaorong [1 ]
Pang, Tao [2 ]
Yu, Hang [1 ]
Luo, Xiangfeng [1 ]
Zhang, Zhenyu [1 ]
机构
[1] Shanghai Univ, Sch Comp Engn & Sci, Shanghai, Peoples R China
[2] China Elect Technol Grp Corp, Res Inst 32, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Reinforcement learning; Imitation learning; Suboptimal demonstration; Trajectory-Ranked;
D O I
10.1109/ICTAI56018.2022.00078
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Robots trained by Imitation Learning(IL) are used in many tasks(e.g., autonomous vehicle manipulation). Generative Adversarial Imitation Learning (GAIL) assumes that the demonstration set used for training is of high quality. However, such demonstrations are difficult and expensive to obtain. GAIL-related methods fail to learn effective strategies if non-high quality demonstrations are used because the performance of agents trained by this method is limited by the demonstrator's operations. Our idea is to enable the agent to learn strategy with better performance than the demonstrator from a suboptimal demonstration set, which contains non-high quality demonstrations that are easier to obtain. Inspired by this, we propose the Trajectory-Ranked Adversarial Imitation Learning (TRAIL) method. First, for demonstration set processing, we introduce a ranking process and define the concept of Performance Relative Advantage of suboptimal demonstrations to specify the ranking order. Second, for model training, we reconstruct the objective function of GAIL and use an experience replay buffer, enabling the agent to learn implicit features and ranking information from the ranked suboptimal demonstration set and possess the ability to outperform the demonstrator. Experiments show that in Mujoco's tasks, our method can learn from a suboptimal demonstration set and can achieve better performance than baseline methods.
引用
收藏
页码:486 / 493
页数:8
相关论文
共 50 条
  • [1] MA-TREX: Mutli-agent Trajectory-Ranked Reward Extrapolation via Inverse Reinforcement Learning
    Huang, Sili
    Yang, Bo
    Chen, Hechang
    Piao, Haiyin
    Sun, Zhixiao
    Chang, Yi
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT (KSEM 2020), PT II, 2020, 12275 : 3 - 14
  • [2] Generative Adversarial Network for Imitation Learning from Single Demonstration
    Tho Nguyen Duc
    Chanh Minh Tran
    Phan Xuan Tan
    Kamioka, Eiji
    BAGHDAD SCIENCE JOURNAL, 2021, 18 (04) : 1350 - 1355
  • [3] Adversarial Imitation Learning via Random Search
    Shin, MyungJae
    Kim, Joongheon
    2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [4] Learning from Demonstration: Provably Efficient Adversarial Policy Imitation with Linear Function Approximation
    Liu, Zhihan
    Zhang, Yufeng
    Fu, Zuyue
    Yang, Zhuoran
    Wang, Zhaoran
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [5] Urban Vehicle Trajectory Generation Based on Generative Adversarial Imitation Learning
    Wang, Min
    Cui, Jianqun
    Wong, Yew Wee
    Chang, Yanan
    Wu, Libing
    Jin, Jiong
    IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2024, 73 (12) : 18237 - 18249
  • [6] Multimodal Storytelling via Generative Adversarial Imitation Learning
    Chen, Zhiqian
    Zhang, Xuchao
    Boedihardjo, Arnold P.
    Dai, Jing
    Lu, Chang-Tien
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 3967 - 3973
  • [7] Perception-Aware-Based UAV Trajectory Planner via Generative Adversarial Self-Imitation Learning From Demonstrations
    Zhang, Hanxuan
    Huo, Ju
    Huang, Yulong
    Cheng, Jiajun
    Li, Xiaofeng
    IEEE INTERNET OF THINGS JOURNAL, 2025, 12 (03): : 3248 - 3260
  • [8] Imitation Learning from Imperfect Demonstration
    Wu, Yueh-Hua
    Charoenphakdee, Nontawat
    Bao, Han
    Tangkaratt, Voot
    Sugiyama, Masashi
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [9] Deep Adversarial Imitation Learning of Locomotion Skills from One-shot Video Demonstration
    Zhang, Huiwen
    Liu, Yuwang
    Zhou, Weijia
    2019 9TH IEEE ANNUAL INTERNATIONAL CONFERENCE ON CYBER TECHNOLOGY IN AUTOMATION, CONTROL, AND INTELLIGENT SYSTEMS (IEEE-CYBER 2019), 2019, : 1257 - 1261
  • [10] Skill Disentanglement for Imitation Learning from Suboptimal Demonstrations
    Zhao, Tianxiang
    Yu, Wenchao
    Wang, Suhang
    Wang, Lu
    Zhang, Xiang
    Chen, Yuncong
    Liu, Yanchi
    Cheng, Wei
    Chen, Haifeng
    PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 3513 - 3524