Prediction of Reward Functions for Deep Reinforcement Learning via Gaussian Process Regression

被引:22
|
作者
Lim, Jaehyun [1 ]
Ha, Seungchul [1 ]
Choi, Jongeun [1 ]
机构
[1] Yonsei Univ, Sch Mech Engn, Seoul 03722, South Korea
基金
新加坡国家研究基金会;
关键词
Gaussian processes; inverse reinforcement learning; mobile robots; MOBILE; SELECTION;
D O I
10.1109/TMECH.2020.2993564
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Inverse reinforcement learning (IRL) is a technique for automatic reward acquisition, however, it is difficult to apply to high-dimensional problems with unknown dynamics. This article proposes an efficient way to solve the IRL problem based on the sparse Gaussian process (GP) prediction with l(1)-regularization only using a highly limited number of expert demonstrations. A GP model is proposed to be trained to predict a reward function using trajectory-reward pair data generated by deep reinforcement learning with different reward functions. The trained GP successfully predicts the reward functions of human experts from their collected demonstration trajectory datasets. To demonstrate our approach, the proposed approach is applied to the obstacle avoidance navigation of the mobile robot. The experimental results clearly show that the robots can clone the experts' optimality in navigation trajectories avoiding obstacles using only with a very small number of expert demonstration datasets (e.g., <= 6). Therefore, the proposed approach shows great potential to be applied to complex real-world applications in an expert data-efficient manner.
引用
收藏
页码:1739 / 1746
页数:8
相关论文
共 50 条
  • [41] Gaussian process model based reinforcement learning
    Yoo J.H.
    Journal of Institute of Control, Robotics and Systems, 2019, 25 (08) : 746 - 751
  • [42] Deep Inverse Reinforcement Learning by Logistic Regression
    Uchibe, Eiji
    NEURAL INFORMATION PROCESSING, ICONIP 2016, PT I, 2016, 9947 : 23 - 31
  • [43] DEEP REINFORCEMENT LEARNING FOR VIDEO PREDICTION
    Ho, Yung-Han
    Cho, Chuan-Yuan
    Peng, Wen-Hsiao
    2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 604 - 608
  • [44] Deep Reinforcement Learning for Stock Prediction
    Zhang, Junhao
    Lei, Yifei
    SCIENTIFIC PROGRAMMING, 2022, 2022
  • [45] Learning Robust Representation for Reinforcement Learning with Distractions by Reward Sequence Prediction
    Zhou, Qi
    Wang, Jie
    Liu, Qiyuan
    Kuang, Yufei
    Zhou, Wengang
    Li, Houqiang
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2023, 216 : 2551 - 2562
  • [46] Probabilistic prediction model for critical chloride concentration of reinforcement corrosion based on improved Gaussian process regression
    Zhou, Huanyu
    Wang, Zizhen
    Chen, Xiaojie
    Yu, Bo
    MAGAZINE OF CONCRETE RESEARCH, 2024,
  • [47] Parallel Placement of Virtualized Network Functions via Federated Deep Reinforcement Learning
    Huang, Haojun
    Tian, Jialin
    Min, Geyong
    Yin, Hao
    Zeng, Cheng
    Zhao, Yangming
    Wu, Dapeng Oliver
    IEEE-ACM TRANSACTIONS ON NETWORKING, 2024, 32 (04) : 2936 - 2949
  • [48] Learning to Walk via Deep Reinforcement Learning
    Haarnoja, Tuomas
    Ha, Sehoon
    Zhou, Aurick
    Tan, Jie
    Tucker, George
    Levine, Sergey
    ROBOTICS: SCIENCE AND SYSTEMS XV, 2019,
  • [49] Exploring the design of reward functions in deep reinforcement learning-based vehicle velocity control algorithms
    He, Yixu
    Liu, Yang
    Yang, Lan
    Qu, Xiaobo
    TRANSPORTATION LETTERS-THE INTERNATIONAL JOURNAL OF TRANSPORTATION RESEARCH, 2024, 16 (10): : 1338 - 1352
  • [50] Model Learning with Local Gaussian Process Regression
    Nguyen-Tuong, Duy
    Seeger, Matthias
    Peters, Jan
    ADVANCED ROBOTICS, 2009, 23 (15) : 2015 - 2034