Inverse Reinforcement Learning of Interaction Dynamics from Demonstrations

被引:0
|
作者
Hussein, Mostafa [1 ]
Begum, Momotaz [1 ]
Petrik, Marek [2 ]
机构
[1] Univ New Hampshire, Cognit Assist Robot Lab, Durham, NH 03824 USA
[2] Univ New Hampshire, Dept Comp Sci, Durham, NH 03824 USA
基金
美国国家科学基金会;
关键词
AUTISM; CHILDREN; ROBOTS; INTERVENTIONS; BEHAVIOR; THERAPY;
D O I
10.1109/icra.2019.8793867
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a framework to learn the reward function underlying high-level sequential tasks from demonstrations. The purpose of reward learning, in the context of learning from demonstration (LfD), is to generate policies that mimic the demonstrator's policies, thereby enabling imitation learning. We focus on a human-robot interaction (HRI) domain where the goal is to learn and model structured interactions between a human and a robot. Such interactions can be modeled as a partially observable Markov decision process (POMDP) where the partial observability is caused by uncertainties associated with the ways humans respond to different stimuli. The key challenge in finding a good policy in such a POMDP is determining the reward function that was observed by the demonstrator. Existing inverse reinforcement learning (IRL) methods for POMDPs are computationally very expensive and the problem is not well understood. In comparison, IRL algorithms for Markov decision process (MDP) are well defined and computationally efficient. We propose an approach of reward function learning for high-level sequential tasks from human demonstrations where the core idea is to reduce the underlying POMDP to an MDP and apply any efficient MDP-IRL algorithm. Our extensive experiments suggest that the reward function learned this way generates POMDP policies that mimic the policies of the demonstrator well.
引用
收藏
页码:2267 / 2274
页数:8
相关论文
共 50 条
  • [1] Learning Fairness from Demonstrations via Inverse Reinforcement Learning
    Blandin, Jack
    Kash, Ian
    [J]. PROCEEDINGS OF THE 2024 ACM CONFERENCE ON FAIRNESS, ACCOUNTABILITY, AND TRANSPARENCY, ACM FACCT 2024, 2024, : 51 - 61
  • [2] Analysis of Inverse Reinforcement Learning with Perturbed Demonstrations
    Melo, Francisco S.
    Lopes, Manuel
    Ferreira, Ricardo
    [J]. ECAI 2010 - 19TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2010, 215 : 349 - 354
  • [3] An Unified Approach to Inverse Reinforcement Learning by Oppositive Demonstrations
    Hwang, Kao-Shing
    Jiang, Wei-Cheng
    Tseng, Yi-Chia
    [J]. PROCEEDINGS 2016 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL TECHNOLOGY (ICIT), 2016, : 1664 - 1668
  • [4] Multi-Modal Inverse Constrained Reinforcement Learning from a Mixture of Demonstrations
    Qiao, Guanren
    Liu, Guiliang
    Poupart, Pascal
    Xu, Zhiqiang
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [5] Gamma-Regression-Based Inverse Reinforcement Learning From Suboptimal Demonstrations
    Kishikawa, Daiko
    Arai, Sachiyo
    [J]. IEEE ACCESS, 2024, 12 : 128519 - 128524
  • [6] Modeling Socially Normative Navigation Behaviors from Demonstrations with Inverse Reinforcement Learning
    Gao, Xingyuan
    Zhao, Xiaoguang
    Tan, Min
    [J]. 2019 IEEE 15TH INTERNATIONAL CONFERENCE ON AUTOMATION SCIENCE AND ENGINEERING (CASE), 2019, : 1333 - 1340
  • [7] Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations
    Brown, Daniel S.
    Goo, Wonjoon
    Nagarajan, Prabhat
    Niekum, Scott
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [8] An Efficient Unified Approach Using Demonstrations for Inverse Reinforcement Learning
    Hwang, Maxwell
    Jiang, Wei-Cheng
    Chen, Yu-Jen
    Hwang, Kao-Shing
    Tseng, Yi-Chia
    [J]. IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2021, 13 (03) : 444 - 452
  • [9] Learning Virtual Grasp with Failed Demonstrations via Bayesian Inverse Reinforcement Learning
    Xie, Xu
    Li, Changyang
    Zhang, Chi
    Zhu, Yixin
    Zhu, Song-Chun
    [J]. 2019 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2019, : 1812 - 1817
  • [10] Learning from Demonstrations and Human Evaluative Feedbacks: Handling Sparsity and Imperfection Using Inverse Reinforcement Learning Approach
    Mourad, Nafee
    Ezzeddine, Ali
    Nadjar Araabi, Babak
    Nili Ahmadabadi, Majid
    [J]. JOURNAL OF ROBOTICS, 2020, 2020