Prediction of Reward Functions for Deep Reinforcement Learning via Gaussian Process Regression

被引:19
|
作者
Lim, Jaehyun [1 ]
Ha, Seungchul [1 ]
Choi, Jongeun [1 ]
机构
[1] Yonsei Univ, Sch Mech Engn, Seoul 03722, South Korea
基金
新加坡国家研究基金会;
关键词
Gaussian processes; inverse reinforcement learning; mobile robots; MOBILE; SELECTION;
D O I
10.1109/TMECH.2020.2993564
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Inverse reinforcement learning (IRL) is a technique for automatic reward acquisition, however, it is difficult to apply to high-dimensional problems with unknown dynamics. This article proposes an efficient way to solve the IRL problem based on the sparse Gaussian process (GP) prediction with l(1)-regularization only using a highly limited number of expert demonstrations. A GP model is proposed to be trained to predict a reward function using trajectory-reward pair data generated by deep reinforcement learning with different reward functions. The trained GP successfully predicts the reward functions of human experts from their collected demonstration trajectory datasets. To demonstrate our approach, the proposed approach is applied to the obstacle avoidance navigation of the mobile robot. The experimental results clearly show that the robots can clone the experts' optimality in navigation trajectories avoiding obstacles using only with a very small number of expert demonstration datasets (e.g., <= 6). Therefore, the proposed approach shows great potential to be applied to complex real-world applications in an expert data-efficient manner.
引用
收藏
页码:1739 / 1746
页数:8
相关论文
共 50 条
  • [21] Automated Assessment of Bone Age Using Deep Learning and Gaussian Process Regression
    Van Steenkiste, Tom
    Ruyssinck, Joeri
    Janssens, Olivier
    Vandersmissen, Baptist
    Vandecasteele, Florian
    Devolder, Pieter
    Achten, Eric
    Van Hoecke, Sofie
    Deschrijver, Dirk
    Dhaene, Tom
    [J]. 2018 40TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2018, : 674 - 677
  • [22] Inverse Reinforcement Learning with Locally Consistent Reward Functions
    Quoc Phong Nguyen
    Low, Kian Hsiang
    Jaillet, Patrick
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
  • [23] Nonstationary covariance functions for Gaussian process regression
    Paciorek, CJ
    Schervish, MJ
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 16, 2004, 16 : 273 - 280
  • [24] Deep reinforcement learning-based rehabilitation robot trajectory planning with optimized reward functions
    Wang, Xusheng
    Xie, Jiexin
    Guo, Shijie
    Li, Yue
    Sun, Pengfei
    Gan, Zhongxue
    [J]. ADVANCES IN MECHANICAL ENGINEERING, 2021, 13 (12)
  • [25] Accurate Prediction of Network Distance via Federated Deep Reinforcement Learning
    Huang, Haojun
    Cai, Yiming
    Min, Geyong
    Wang, Haozhe
    Liu, Gaoyang
    Wu, Dapeng Oliver
    [J]. IEEE-ACM TRANSACTIONS ON NETWORKING, 2024, 32 (04) : 3301 - 3314
  • [26] Accurate Prediction of Required Virtual Resources via Deep Reinforcement Learning
    Huang, Haojun
    Li, Zhaoxi
    Tian, Jialin
    Min, Geyong
    Miao, Wang
    Wu, Dapeng Oliver
    [J]. IEEE-ACM TRANSACTIONS ON NETWORKING, 2023, 31 (02) : 920 - 933
  • [27] Prediction of Boiler Combustion Energy Efficiency via Deep Reinforcement Learning
    Jiang, Hui
    Cai, Ziyun
    Zhang, Tengfei
    Peng, Chen
    [J]. 2021 PROCEEDINGS OF THE 40TH CHINESE CONTROL CONFERENCE (CCC), 2021, : 2658 - 2662
  • [28] Variance aware reward smoothing for deep reinforcement learning
    Dong, Yunlong
    Zhang, Shengjun
    Liu, Xing
    Zhang, Yu
    Shen, Tan
    [J]. NEUROCOMPUTING, 2021, 458 : 327 - 335
  • [29] Reward Space Noise for Exploration in Deep Reinforcement Learning
    Sun, Chuxiong
    Wang, Rui
    Li, Qian
    Hu, Xiaohui
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2021, 35 (10)
  • [30] Deep reinforcement learning with reward design for quantum control
    Yu, Haixu
    Zhao, Xudong
    [J]. IEEE Transactions on Artificial Intelligence, 2024, 5 (03): : 1087 - 1101