Prediction of Reward Functions for Deep Reinforcement Learning via Gaussian Process Regression

被引：22

作者：

Lim, Jaehyun ^{[1
]}

Ha, Seungchul ^{[1
]}

Choi, Jongeun ^{[1
]}

机构：

[1] Yonsei Univ, Sch Mech Engn, Seoul 03722, South Korea

来源：

IEEE-ASME TRANSACTIONS ON MECHATRONICS | 2020年 / 25卷 / 04期

基金：

新加坡国家研究基金会;

关键词：

Gaussian processes; inverse reinforcement learning; mobile robots; MOBILE; SELECTION;

D O I：

10.1109/TMECH.2020.2993564

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Inverse reinforcement learning (IRL) is a technique for automatic reward acquisition, however, it is difficult to apply to high-dimensional problems with unknown dynamics. This article proposes an efficient way to solve the IRL problem based on the sparse Gaussian process (GP) prediction with l(1)-regularization only using a highly limited number of expert demonstrations. A GP model is proposed to be trained to predict a reward function using trajectory-reward pair data generated by deep reinforcement learning with different reward functions. The trained GP successfully predicts the reward functions of human experts from their collected demonstration trajectory datasets. To demonstrate our approach, the proposed approach is applied to the obstacle avoidance navigation of the mobile robot. The experimental results clearly show that the robots can clone the experts' optimality in navigation trajectories avoiding obstacles using only with a very small number of expert demonstration datasets (e.g., <= 6). Therefore, the proposed approach shows great potential to be applied to complex real-world applications in an expert data-efficient manner.

引用

页码：1739 / 1746

页数：8

共 50 条

[1] Inverse Reinforcement Learning via Deep Gaussian Process
Jin, Ming
Damianou, Andreas
Abbeel, Pieter
Spanos, Costas
CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE (UAI2017), 2017,
[2] Socially aware robot navigation in crowds via deep reinforcement learning with resilient reward functions
Lu, Xiaojun
Woo, Hanwool
Faragasso, Angela
Yamashita, Atsushi
Asama, Hajime
ADVANCED ROBOTICS, 2022, 36 (08) : 388 - 403
[3] Active Preference-Based Gaussian Process Regression for Reward Learning
Biyik, Lirdem
Huynh, Nicolas
Kochenderfer, Mykel J.
Sadigh, Dorsa
ROBOTICS: SCIENCE AND SYSTEMS XVI, 2020,
[4] Prediction Performance After Learning in Gaussian Process Regression
Wagberg, Johan
Zachariah, Dave
Schon, Thomas B.
Stoica, Petre
ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 54, 2017, 54 : 1264 - 1272
[5] Deep Reinforcement Learning With Optimized Reward Functions for Robotic Trajectory Planning
Xie, Jiexin
Shao, Zhenzhou
Li, Yue
Guan, Yong
Tan, Jindong
IEEE ACCESS, 2019, 7 : 105669 - 105679
[6] Reinforcement learning reward functions for unsupervised learning
Fyfe, Colin
Lai, Pei Ling
ADVANCES IN NEURAL NETWORKS - ISNN 2007, PT 1, PROCEEDINGS, 2007, 4491 : 397 - +
[7] Reinforcement learning with Gaussian process regression using variational free energy
Kameda, Kiseki
Tanaka, Fuyuhiko
JOURNAL OF INTELLIGENT SYSTEMS, 2023, 32 (01)
[8] Active preference-based Gaussian process regression for reward learning and optimization
Biyik, Erdem
Huynh, Nicolas
Kochenderfer, Mykel J.
Sadigh, Dorsa
INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2024, 43 (05): : 665 - 684
[9] Transformable Gaussian Reward Function for Socially Aware Navigation Using Deep Reinforcement Learning
Kim, Jinyeob
Kang, Sumin
Yang, Sungwoo
Kim, Beomjoon
Yura, Jargalbaatar
Kim, Donghan
SENSORS, 2024, 24 (14)
[10] Evaluation of Gaussian process regression kernel functions for improving groundwater prediction
Pan, Yue
Zeng, Xiankui
Xu, Hongxia
Sun, Yuanyuan
Wang, Dong
Wu, Jichun
JOURNAL OF HYDROLOGY, 2021, 603

← 1 2 3 4 5 →