Model-free inverse reinforcement learning with multi-intention, unlabeled, and overlapping demonstrations

被引:1
|
作者
Bighashdel, Ariyan [1 ]
Jancura, Pavol [1 ]
Dubbelman, Gijs [1 ]
机构
[1] Eindhoven Univ Technol, Elect Engn, NL-5612 AZ Eindhoven, Netherlands
关键词
Inverse reinforcement learning; Multi-intention; Model-free reinforcement learning; Unlabeled demonstrations; Overlapping demonstrations; Mixture of logistic regressions;
D O I
10.1007/s10994-022-06273-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we define a novel inverse reinforcement learning (IRL) problem where the demonstrations are multi-intention, i.e., collected from multi-intention experts, unlabeled, i.e., without intention labels, and partially overlapping, i.e., shared between multiple intentions. In the presence of overlapping demonstrations, current IRL methods, developed to handle multi-intention and unlabeled demonstrations, cannot successfully learn the underlying reward functions. To solve this limitation, we propose a novel clustering-based approach to disentangle the observed demonstrations and experimentally validate its advantages. Traditional clustering-based approaches to multi-intention IRL, which are developed on the basis of model-based Reinforcement Learning (RL), formulate the problem using parametric density estimation. However, in high-dimensional environments and unknown system dynamics, i.e., model-free RL, the solution of parametric density estimation is only tractable up to the density normalization constant. To solve this, we formulate the problem as a mixture of logistic regressions to directly handle the unnormalized density. To research the challenges faced by overlapping demonstrations, we introduce the concepts of shared pair, which is a state-action pair that is shared in more than one intention, and separability, which resembles how well the multiple intentions can be separated in the joint state-action space. We provide theoretical analyses under the global optimality condition and the existence of shared pairs. Furthermore, we conduct extensive experiments on four simulated robotics tasks, extended to accept different intentions with specific levels of separability, and a synthetic driver task developed to directly control the separability. We evaluate the existing baselines on our defined problem and demonstrate, theoretically and experimentally, the advantages of our clustering-based solution, especially when the separability of the demonstrations decreases.
引用
收藏
页码:2263 / 2296
页数:34
相关论文
共 50 条
  • [21] On Improving Model-Free Algorithms for Decentralized Multi-Agent Reinforcement Learning
    Mao, Weichao
    Yang, Lin F.
    Zhang, Kaiqing
    Basar, Tamer
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [22] Analysis of Inverse Reinforcement Learning with Perturbed Demonstrations
    Melo, Francisco S.
    Lopes, Manuel
    Ferreira, Ricardo
    [J]. ECAI 2010 - 19TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2010, 215 : 349 - 354
  • [23] Review-based Multi-intention Contrastive Learning for Recommendation
    Yang, Wei
    Huo, Tengfei
    Liu, Zhiqiang
    Lu, Chi
    [J]. PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 2339 - 2343
  • [24] Policy Learning with Constraints in Model-free Reinforcement Learning: A Survey
    Liu, Yongshuai
    Halev, Avishai
    Liu, Xin
    [J]. PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 4508 - 4515
  • [25] Multi-Modal Inverse Constrained Reinforcement Learning from a Mixture of Demonstrations
    Qiao, Guanren
    Liu, Guiliang
    Poupart, Pascal
    Xu, Zhiqiang
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [26] Constrained model-free reinforcement learning for process optimization
    Pan, Elton
    Petsagkourakis, Panagiotis
    Mowbray, Max
    Zhang, Dongda
    del Rio-Chanona, Ehecatl Antonio
    [J]. COMPUTERS & CHEMICAL ENGINEERING, 2021, 154
  • [27] Improving Optimistic Exploration in Model-Free Reinforcement Learning
    Grzes, Marek
    Kudenko, Daniel
    [J]. ADAPTIVE AND NATURAL COMPUTING ALGORITHMS, 2009, 5495 : 360 - 369
  • [28] Model-Free Preference-Based Reinforcement Learning
    Wirth, Christian
    Fuernkranz, Johannes
    Neumann, Gerhard
    [J]. THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 2222 - 2228
  • [29] Model-Free μ Synthesis via Adversarial Reinforcement Learning
    Keivan, Darioush
    Havens, Aaron
    Seiler, Peter
    Dullerud, Geir
    Hu, Bin
    [J]. 2022 AMERICAN CONTROL CONFERENCE, ACC, 2022, : 3335 - 3341
  • [30] An adaptive clustering method for model-free reinforcement learning
    Matt, A
    Regensburger, G
    [J]. INMIC 2004: 8TH INTERNATIONAL MULTITOPIC CONFERENCE, PROCEEDINGS, 2004, : 362 - 367