Prediction of Reward Functions for Deep Reinforcement Learning via Gaussian Process Regression

被引:19
|
作者
Lim, Jaehyun [1 ]
Ha, Seungchul [1 ]
Choi, Jongeun [1 ]
机构
[1] Yonsei Univ, Sch Mech Engn, Seoul 03722, South Korea
基金
新加坡国家研究基金会;
关键词
Gaussian processes; inverse reinforcement learning; mobile robots; MOBILE; SELECTION;
D O I
10.1109/TMECH.2020.2993564
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Inverse reinforcement learning (IRL) is a technique for automatic reward acquisition, however, it is difficult to apply to high-dimensional problems with unknown dynamics. This article proposes an efficient way to solve the IRL problem based on the sparse Gaussian process (GP) prediction with l(1)-regularization only using a highly limited number of expert demonstrations. A GP model is proposed to be trained to predict a reward function using trajectory-reward pair data generated by deep reinforcement learning with different reward functions. The trained GP successfully predicts the reward functions of human experts from their collected demonstration trajectory datasets. To demonstrate our approach, the proposed approach is applied to the obstacle avoidance navigation of the mobile robot. The experimental results clearly show that the robots can clone the experts' optimality in navigation trajectories avoiding obstacles using only with a very small number of expert demonstration datasets (e.g., <= 6). Therefore, the proposed approach shows great potential to be applied to complex real-world applications in an expert data-efficient manner.
引用
收藏
页码:1739 / 1746
页数:8
相关论文
共 50 条
  • [1] Inverse Reinforcement Learning via Deep Gaussian Process
    Jin, Ming
    Damianou, Andreas
    Abbeel, Pieter
    Spanos, Costas
    [J]. CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE (UAI2017), 2017,
  • [2] Socially aware robot navigation in crowds via deep reinforcement learning with resilient reward functions
    Lu, Xiaojun
    Woo, Hanwool
    Faragasso, Angela
    Yamashita, Atsushi
    Asama, Hajime
    [J]. ADVANCED ROBOTICS, 2022, 36 (08) : 388 - 403
  • [3] Active Preference-Based Gaussian Process Regression for Reward Learning
    Biyik, Lirdem
    Huynh, Nicolas
    Kochenderfer, Mykel J.
    Sadigh, Dorsa
    [J]. ROBOTICS: SCIENCE AND SYSTEMS XVI, 2020,
  • [4] Prediction Performance After Learning in Gaussian Process Regression
    Wagberg, Johan
    Zachariah, Dave
    Schon, Thomas B.
    Stoica, Petre
    [J]. ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 54, 2017, 54 : 1264 - 1272
  • [5] Deep Reinforcement Learning With Optimized Reward Functions for Robotic Trajectory Planning
    Xie, Jiexin
    Shao, Zhenzhou
    Li, Yue
    Guan, Yong
    Tan, Jindong
    [J]. IEEE ACCESS, 2019, 7 : 105669 - 105679
  • [6] Reinforcement learning reward functions for unsupervised learning
    Fyfe, Colin
    Lai, Pei Ling
    [J]. ADVANCES IN NEURAL NETWORKS - ISNN 2007, PT 1, PROCEEDINGS, 2007, 4491 : 397 - +
  • [7] Reinforcement learning with Gaussian process regression using variational free energy
    Kameda, Kiseki
    Tanaka, Fuyuhiko
    [J]. JOURNAL OF INTELLIGENT SYSTEMS, 2023, 32 (01)
  • [8] Active preference-based Gaussian process regression for reward learning and optimization
    Biyik, Erdem
    Huynh, Nicolas
    Kochenderfer, Mykel J.
    Sadigh, Dorsa
    [J]. INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2024, 43 (05): : 665 - 684
  • [9] Transformable Gaussian Reward Function for Socially Aware Navigation Using Deep Reinforcement Learning
    Kim, Jinyeob
    Kang, Sumin
    Yang, Sungwoo
    Kim, Beomjoon
    Yura, Jargalbaatar
    Kim, Donghan
    [J]. SENSORS, 2024, 24 (14)
  • [10] Evaluation of Gaussian process regression kernel functions for improving groundwater prediction
    Pan, Yue
    Zeng, Xiankui
    Xu, Hongxia
    Sun, Yuanyuan
    Wang, Dong
    Wu, Jichun
    [J]. JOURNAL OF HYDROLOGY, 2021, 603