Prediction of Reward Functions for Deep Reinforcement Learning via Gaussian Process Regression

被引:19
|
作者
Lim, Jaehyun [1 ]
Ha, Seungchul [1 ]
Choi, Jongeun [1 ]
机构
[1] Yonsei Univ, Sch Mech Engn, Seoul 03722, South Korea
基金
新加坡国家研究基金会;
关键词
Gaussian processes; inverse reinforcement learning; mobile robots; MOBILE; SELECTION;
D O I
10.1109/TMECH.2020.2993564
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Inverse reinforcement learning (IRL) is a technique for automatic reward acquisition, however, it is difficult to apply to high-dimensional problems with unknown dynamics. This article proposes an efficient way to solve the IRL problem based on the sparse Gaussian process (GP) prediction with l(1)-regularization only using a highly limited number of expert demonstrations. A GP model is proposed to be trained to predict a reward function using trajectory-reward pair data generated by deep reinforcement learning with different reward functions. The trained GP successfully predicts the reward functions of human experts from their collected demonstration trajectory datasets. To demonstrate our approach, the proposed approach is applied to the obstacle avoidance navigation of the mobile robot. The experimental results clearly show that the robots can clone the experts' optimality in navigation trajectories avoiding obstacles using only with a very small number of expert demonstration datasets (e.g., <= 6). Therefore, the proposed approach shows great potential to be applied to complex real-world applications in an expert data-efficient manner.
引用
收藏
页码:1739 / 1746
页数:8
相关论文
共 50 条
  • [31] Deep Reinforcement Learning for Video Summarization with Semantic Reward
    Sun, Haoran
    Zhu, Xiaolong
    Zhou, Conghua
    [J]. 2022 IEEE 22ND INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY, AND SECURITY COMPANION, QRS-C, 2022, : 754 - 755
  • [32] Bearing remaining life prediction using Gaussian process regression with composite kernel functions
    Hong, Sheng
    Zhou, Zheng
    Lu, Chen
    Wang, Baoqing
    Zhao, Tingdi
    [J]. JOURNAL OF VIBROENGINEERING, 2015, 17 (02) : 695 - 704
  • [33] Bearing remaining life prediction using gaussian process regression with composite kernel functions
    Science and Technology on Reliability and Environmental Engineering Laboratory, School of Reliability and System Engineering, Beihang University, Beihang, China
    不详
    [J]. J. Vibroeng., 2 (695-704):
  • [34] Episodic Reinforcement Learning by Logistic Reward-Weighted Regression
    Wierstra, Daan
    Schaul, Tom
    Peters, Jan
    Schmidhuber, Juergen
    [J]. ARTIFICIAL NEURAL NETWORKS - ICANN 2008, PT I, 2008, 5163 : 407 - +
  • [35] Longitudinal Deep Kernel Gaussian Process Regression
    Liang, Junjie
    Wu, Yanting
    Xu, Dongkuan
    Honavar, Vasant G.
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 8556 - 8564
  • [36] Gaussian process regression for tool wear prediction
    Kong, Dongdong
    Chen, Yongjie
    Li, Ning
    [J]. MECHANICAL SYSTEMS AND SIGNAL PROCESSING, 2018, 104 : 556 - 574
  • [37] Tracking-by-Fusion via Gaussian Process Regression Extended to Transfer Learning
    Gao, Jin
    Wang, Qiang
    Xing, Junliang
    Ling, Haibin
    Hu, Weiming
    Maybank, Stephen
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (04) : 939 - 955
  • [38] Learning-driven Nonlinear Optimal Control via Gaussian Process Regression
    Sforni, Lorenzo
    Notarnicola, Ivano
    Notarstefano, Giuseppe
    [J]. 2021 60TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2021, : 4412 - 4417
  • [39] Gaussian process model based reinforcement learning
    Yoo J.H.
    [J]. Journal of Institute of Control, Robotics and Systems, 2019, 25 (08) : 746 - 751
  • [40] Deep Inverse Reinforcement Learning by Logistic Regression
    Uchibe, Eiji
    [J]. NEURAL INFORMATION PROCESSING, ICONIP 2016, PT I, 2016, 9947 : 23 - 31