Model-free inverse reinforcement learning with multi-intention, unlabeled, and overlapping demonstrations

被引:1
|
作者
Bighashdel, Ariyan [1 ]
Jancura, Pavol [1 ]
Dubbelman, Gijs [1 ]
机构
[1] Eindhoven Univ Technol, Elect Engn, NL-5612 AZ Eindhoven, Netherlands
关键词
Inverse reinforcement learning; Multi-intention; Model-free reinforcement learning; Unlabeled demonstrations; Overlapping demonstrations; Mixture of logistic regressions;
D O I
10.1007/s10994-022-06273-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we define a novel inverse reinforcement learning (IRL) problem where the demonstrations are multi-intention, i.e., collected from multi-intention experts, unlabeled, i.e., without intention labels, and partially overlapping, i.e., shared between multiple intentions. In the presence of overlapping demonstrations, current IRL methods, developed to handle multi-intention and unlabeled demonstrations, cannot successfully learn the underlying reward functions. To solve this limitation, we propose a novel clustering-based approach to disentangle the observed demonstrations and experimentally validate its advantages. Traditional clustering-based approaches to multi-intention IRL, which are developed on the basis of model-based Reinforcement Learning (RL), formulate the problem using parametric density estimation. However, in high-dimensional environments and unknown system dynamics, i.e., model-free RL, the solution of parametric density estimation is only tractable up to the density normalization constant. To solve this, we formulate the problem as a mixture of logistic regressions to directly handle the unnormalized density. To research the challenges faced by overlapping demonstrations, we introduce the concepts of shared pair, which is a state-action pair that is shared in more than one intention, and separability, which resembles how well the multiple intentions can be separated in the joint state-action space. We provide theoretical analyses under the global optimality condition and the existence of shared pairs. Furthermore, we conduct extensive experiments on four simulated robotics tasks, extended to accept different intentions with specific levels of separability, and a synthetic driver task developed to directly control the separability. We evaluate the existing baselines on our defined problem and demonstrate, theoretically and experimentally, the advantages of our clustering-based solution, especially when the separability of the demonstrations decreases.
引用
收藏
页码:2263 / 2296
页数:34
相关论文
共 50 条
  • [31] Model-Free Reinforcement Learning for Mean Field Games
    Mishra, Rajesh
    Vasal, Deepanshu
    Vishwanath, Sriram
    [J]. IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS, 2023, 10 (04): : 2141 - 2151
  • [32] Counterfactual Credit Assignment in Model-Free Reinforcement Learning
    Mesnard, Thomas
    Weber, Theophane
    Viola, Fabio
    Thakoor, Shantanu
    Saade, Alaa
    Harutyunyan, Anna
    Dabney, Will
    Stepleton, Tom
    Heess, Nicolas
    Guez, Arthur
    Moulines, Eric
    Hutter, Marcus
    Buesing, Lars
    Munos, Remi
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [33] Covariance matrix adaptation for model-free reinforcement learning
    Adaptation de la matrice de covariance pour l'apprentissage par renforcement direct
    [J]. 2013, Lavoisier, 14 rue de Provigny, Cachan Cedex, F-94236, France (27)
  • [34] Driving in Dense Traffic with Model-Free Reinforcement Learning
    Saxena, Dhruv Mauria
    Bae, Sangjae
    Nakhaei, Alireza
    Fujimura, Kikuo
    Likhachev, Maxim
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2020, : 5385 - 5392
  • [35] Model-Free Reinforcement Learning with Continuous Action in Practice
    Degris, Thomas
    Pilarski, Patrick M.
    Sutton, Richard S.
    [J]. 2012 AMERICAN CONTROL CONFERENCE (ACC), 2012, : 2177 - 2182
  • [36] Robotic Table Tennis with Model-Free Reinforcement Learning
    Gao, Wenbo
    Graesser, Laura
    Choromanski, Krzysztof
    Song, Xingyou
    Lazic, Nevena
    Sanketi, Pannag
    Sindhwani, Vikas
    Jaitly, Navdeep
    [J]. 2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2020, : 5556 - 5563
  • [37] MODEL-FREE ONLINE REINFORCEMENT LEARNING OF A ROBOTIC MANIPULATOR
    Sweafford, Jerry, Jr.
    Fahimi, Farbod
    [J]. MECHATRONIC SYSTEMS AND CONTROL, 2019, 47 (03): : 136 - 143
  • [38] MFRLMO: Model-free reinforcement learning for multi-objective optimization of apache spark
    Öztürk, Muhammed Maruf
    [J]. EAI Endorsed Transactions on Scalable Information Systems, 2024, 11 (05) : 1 - 15
  • [39] Model-Free Reinforcement Learning for Fully Cooperative Multi-Agent Graphical Games
    Zhang, Qichao
    Zhao, Dongbin
    Lewis, Frank L.
    [J]. 2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
  • [40] Model-free Reinforcement Learning based Multi-stage Smart Noise Jamming
    Wang, Yuanhang
    Zhang, Tianxian
    Xu, Longxiao
    Tian, Tuanwei
    Kong, Lingjiang
    Yang, Xiaobo
    [J]. 2019 IEEE RADAR CONFERENCE (RADARCONF), 2019,