Hessian matrix distribution for Bayesian policy gradient reinforcement learning

被引：24

作者：

Ngo Anh Vien ^{[1
,2
]}

Yu, Hwanjo ^{[1
]}

Chung, TaeChoong ^{[2
]}

机构：

[1] Pohang Univ Sci & Technol POSTECH, Dept Comp Sci & Engn, Data Min Lab, Pohang, South Korea

[2] Kyung Hee Univ, Sch Elect & Informat, Dept Comp Engn, Artificial Intelligence Lab, Yongin 446701, Gyeonggi, South Korea

来源：

INFORMATION SCIENCES | 2011年 / 181卷 / 09期

关键词：

Markov decision process; Reinforcement learning; Bayesian policy gradient; Monte-Carlo policy gradient; Policy gradient; Hessian matrix distribution;

D O I：

10.1016/j.ins.2011.01.001

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Bayesian policy gradient algorithms have been recently proposed for modeling the policy gradient of the performance measure in reinforcement learning as a Gaussian process. These methods were known to reduce the variance and the number of samples needed to obtain accurate gradient estimates in comparison to the conventional Monte-Carlo policy gradient algorithms. In this paper, we propose an improvement over previous Bayesian frameworks for the policy gradient. We use the Hessian matrix distribution as a learning rate schedule to improve the performance of the Bayesian policy gradient algorithm in terms of the variance and the number of samples. As in computing the policy gradient distributions, the Bayesian quadrature method is used to estimate the Hessian matrix distributions. We prove that the posterior mean of the Hessian distribution estimate is symmetric, one of the important properties of the Hessian matrix. Moreover, we prove that with an appropriate choice of kernel, the computational complexity of Hessian distribution estimate is equal to that of the policy gradient distribution estimates. Using simulations, we show encouraging experimental results comparing the proposed algorithm to the Bayesian policy gradient and the Bayesian policy natural gradient algorithms described in Ghavamzadeh and Engel [10]. (C) 2011 Elsevier Inc. All rights reserved.

引用

页码：1671 / 1685

页数：15

共 50 条

[21] Batch Reinforcement Learning With a Nonparametric Off-Policy Policy Gradient
Tosatto, Samuele
Carvalho, Joao
Peters, Jan
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (10) : 5996 - 6010
[22] Bayesian Residual Policy Optimization: Scalable Bayesian Reinforcement Learning with Clairvoyant Experts
Lee, Gilwoo
Hou, Brian
Choudhury, Sanjiban
Srinivasa, Siddhartha S.
2021 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2021, : 5611 - 5618
[23] AUTOMATIC AND PARALLEL GENERATION OF GRADIENT AND HESSIAN MATRIX
FISCHER, H
LECTURE NOTES IN CONTROL AND INFORMATION SCIENCES, 1990, 143 : 104 - 114
[24] Reinforcement learning with knowledge by using a stochastic gradient method on a Bayesian network
Yamamura, M
Onozuka, T
IEEE WORLD CONGRESS ON COMPUTATIONAL INTELLIGENCE, 1998, : 2045 - 2050
[25] Molecule generation using transformers and policy gradient reinforcement learning
Mazuz, Eyal
Shtar, Guy
Shapira, Bracha
Rokach, Lior
SCIENTIFIC REPORTS, 2023, 13 (01)
[26] Reinforcement Learning based on MPC and the Stochastic Policy Gradient Method
Gros, Sebastien
Zanon, Mario
2021 AMERICAN CONTROL CONFERENCE (ACC), 2021, : 1947 - 1952
[27] Derivatives of Logarithmic Stationary Distributions for Policy Gradient Reinforcement Learning
Morimura, Tetsuro
Uchibe, Eiji
Yoshimoto, Junichiro
Peters, Jan
Doya, Kenji
NEURAL COMPUTATION, 2010, 22 (02) : 342 - 376
[28] Using policy gradient reinforcement learning on autonomous robot controllers
Grudic, GZ
Kumar, V
Ungar, L
IROS 2003: PROCEEDINGS OF THE 2003 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, VOLS 1-4, 2003, : 406 - 411
[29] Spiking Variational Policy Gradient for Brain Inspired Reinforcement Learning
Yang, Zhile
Guo, Shangqi
Fang, Ying
Yu, Zhaofei
Liu, Jian K.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2025, 47 (03) : 1975 - 1990
[30] KERNEL-BASED LIFELONG POLICY GRADIENT REINFORCEMENT LEARNING
Mowakeaa, Rami
Kim, Seung-Jun
Emge, Darren K.
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 3500 - 3504

← 1 2 3 4 5 →