Prediction of Reward Functions for Deep Reinforcement Learning via Gaussian Process Regression

被引：22

作者：

Lim, Jaehyun ^{[1
]}

Ha, Seungchul ^{[1
]}

Choi, Jongeun ^{[1
]}

机构：

[1] Yonsei Univ, Sch Mech Engn, Seoul 03722, South Korea

来源：

IEEE-ASME TRANSACTIONS ON MECHATRONICS | 2020年 / 25卷 / 04期

基金：

新加坡国家研究基金会;

关键词：

Gaussian processes; inverse reinforcement learning; mobile robots; MOBILE; SELECTION;

D O I：

10.1109/TMECH.2020.2993564

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Inverse reinforcement learning (IRL) is a technique for automatic reward acquisition, however, it is difficult to apply to high-dimensional problems with unknown dynamics. This article proposes an efficient way to solve the IRL problem based on the sparse Gaussian process (GP) prediction with l(1)-regularization only using a highly limited number of expert demonstrations. A GP model is proposed to be trained to predict a reward function using trajectory-reward pair data generated by deep reinforcement learning with different reward functions. The trained GP successfully predicts the reward functions of human experts from their collected demonstration trajectory datasets. To demonstrate our approach, the proposed approach is applied to the obstacle avoidance navigation of the mobile robot. The experimental results clearly show that the robots can clone the experts' optimality in navigation trajectories avoiding obstacles using only with a very small number of expert demonstration datasets (e.g., <= 6). Therefore, the proposed approach shows great potential to be applied to complex real-world applications in an expert data-efficient manner.

引用

页码：1739 / 1746

页数：8

共 50 条

[31] Deep reinforcement learning with reward design for quantum control
Yu H.
Zhao X.
IEEE Transactions on Artificial Intelligence, 2024, 5 (03): : 1087 - 1101
[32] Longitudinal Deep Kernel Gaussian Process Regression
Liang, Junjie
Wu, Yanting
Xu, Dongkuan
Honavar, Vasant G.
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 8556 - 8564
[33] Accurate Prediction of Required Virtual Resources via Deep Reinforcement Learning
Huang, Haojun
Li, Zhaoxi
Tian, Jialin
Min, Geyong
Miao, Wang
Wu, Dapeng Oliver
IEEE-ACM TRANSACTIONS ON NETWORKING, 2023, 31 (02) : 920 - 933
[34] Episodic Reinforcement Learning by Logistic Reward-Weighted Regression
Wierstra, Daan
Schaul, Tom
Peters, Jan
Schmidhuber, Juergen
ARTIFICIAL NEURAL NETWORKS - ICANN 2008, PT I, 2008, 5163 : 407 - +
[35] Reward Space Noise for Exploration in Deep Reinforcement Learning
Sun, Chuxiong
Wang, Rui
Li, Qian
Hu, Xiaohui
INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2021, 35 (10)
[36] Deep Reinforcement Learning for Video Summarization with Semantic Reward
Sun, Haoran
Zhu, Xiaolong
Zhou, Conghua
2022 IEEE 22ND INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY, AND SECURITY COMPANION, QRS-C, 2022, : 754 - 755
[37] Prediction of Boiler Combustion Energy Efficiency via Deep Reinforcement Learning
Jiang, Hui
Cai, Ziyun
Zhang, Tengfei
Peng, Chen
2021 PROCEEDINGS OF THE 40TH CHINESE CONTROL CONFERENCE (CCC), 2021, : 2658 - 2662
[38] Gaussian process regression for tool wear prediction
Kong, Dongdong
Chen, Yongjie
Li, Ning
MECHANICAL SYSTEMS AND SIGNAL PROCESSING, 2018, 104 : 556 - 574
[39] Tracking-by-Fusion via Gaussian Process Regression Extended to Transfer Learning
Gao, Jin
Wang, Qiang
Xing, Junliang
Ling, Haibin
Hu, Weiming
Maybank, Stephen
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (04) : 939 - 955
[40] Learning-driven Nonlinear Optimal Control via Gaussian Process Regression
Sforni, Lorenzo
Notarnicola, Ivano
Notarstefano, Giuseppe
2021 60TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2021, : 4412 - 4417

← 1 2 3 4 5 →