Prediction of Reward Functions for Deep Reinforcement Learning via Gaussian Process Regression

被引：22

作者：

Lim, Jaehyun ^{[1
]}

Ha, Seungchul ^{[1
]}

Choi, Jongeun ^{[1
]}

机构：

[1] Yonsei Univ, Sch Mech Engn, Seoul 03722, South Korea

来源：

IEEE-ASME TRANSACTIONS ON MECHATRONICS | 2020年 / 25卷 / 04期

基金：

新加坡国家研究基金会;

关键词：

Gaussian processes; inverse reinforcement learning; mobile robots; MOBILE; SELECTION;

D O I：

10.1109/TMECH.2020.2993564

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Inverse reinforcement learning (IRL) is a technique for automatic reward acquisition, however, it is difficult to apply to high-dimensional problems with unknown dynamics. This article proposes an efficient way to solve the IRL problem based on the sparse Gaussian process (GP) prediction with l(1)-regularization only using a highly limited number of expert demonstrations. A GP model is proposed to be trained to predict a reward function using trajectory-reward pair data generated by deep reinforcement learning with different reward functions. The trained GP successfully predicts the reward functions of human experts from their collected demonstration trajectory datasets. To demonstrate our approach, the proposed approach is applied to the obstacle avoidance navigation of the mobile robot. The experimental results clearly show that the robots can clone the experts' optimality in navigation trajectories avoiding obstacles using only with a very small number of expert demonstration datasets (e.g., <= 6). Therefore, the proposed approach shows great potential to be applied to complex real-world applications in an expert data-efficient manner.

引用

页码：1739 / 1746

页数：8

共 50 条

[41] Gaussian process model based reinforcement learning
Yoo J.H.
Journal of Institute of Control, Robotics and Systems, 2019, 25 (08) : 746 - 751
[42] Deep Inverse Reinforcement Learning by Logistic Regression
Uchibe, Eiji
NEURAL INFORMATION PROCESSING, ICONIP 2016, PT I, 2016, 9947 : 23 - 31
[43] DEEP REINFORCEMENT LEARNING FOR VIDEO PREDICTION
Ho, Yung-Han
Cho, Chuan-Yuan
Peng, Wen-Hsiao
2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 604 - 608
[44] Deep Reinforcement Learning for Stock Prediction
Zhang, Junhao
Lei, Yifei
SCIENTIFIC PROGRAMMING, 2022, 2022
[45] Learning Robust Representation for Reinforcement Learning with Distractions by Reward Sequence Prediction
Zhou, Qi
Wang, Jie
Liu, Qiyuan
Kuang, Yufei
Zhou, Wengang
Li, Houqiang
UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2023, 216 : 2551 - 2562
[46] Probabilistic prediction model for critical chloride concentration of reinforcement corrosion based on improved Gaussian process regression
Zhou, Huanyu
Wang, Zizhen
Chen, Xiaojie
Yu, Bo
MAGAZINE OF CONCRETE RESEARCH, 2024,
[47] Parallel Placement of Virtualized Network Functions via Federated Deep Reinforcement Learning
Huang, Haojun
Tian, Jialin
Min, Geyong
Yin, Hao
Zeng, Cheng
Zhao, Yangming
Wu, Dapeng Oliver
IEEE-ACM TRANSACTIONS ON NETWORKING, 2024, 32 (04) : 2936 - 2949
[48] Learning to Walk via Deep Reinforcement Learning
Haarnoja, Tuomas
Ha, Sehoon
Zhou, Aurick
Tan, Jie
Tucker, George
Levine, Sergey
ROBOTICS: SCIENCE AND SYSTEMS XV, 2019,
[49] Exploring the design of reward functions in deep reinforcement learning-based vehicle velocity control algorithms
He, Yixu
Liu, Yang
Yang, Lan
Qu, Xiaobo
TRANSPORTATION LETTERS-THE INTERNATIONAL JOURNAL OF TRANSPORTATION RESEARCH, 2024, 16 (10): : 1338 - 1352
[50] Model Learning with Local Gaussian Process Regression
Nguyen-Tuong, Duy
Seeger, Matthias
Peters, Jan
ADVANCED ROBOTICS, 2009, 23 (15) : 2015 - 2034

← 1 2 3 4 5 →