Hessian matrix distribution for Bayesian policy gradient reinforcement learning

被引：24

作者：

Ngo Anh Vien ^{[1
,2
]}

Yu, Hwanjo ^{[1
]}

Chung, TaeChoong ^{[2
]}

机构：

[1] Pohang Univ Sci & Technol POSTECH, Dept Comp Sci & Engn, Data Min Lab, Pohang, South Korea

[2] Kyung Hee Univ, Sch Elect & Informat, Dept Comp Engn, Artificial Intelligence Lab, Yongin 446701, Gyeonggi, South Korea

来源：

INFORMATION SCIENCES | 2011年 / 181卷 / 09期

关键词：

Markov decision process; Reinforcement learning; Bayesian policy gradient; Monte-Carlo policy gradient; Policy gradient; Hessian matrix distribution;

D O I：

10.1016/j.ins.2011.01.001

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Bayesian policy gradient algorithms have been recently proposed for modeling the policy gradient of the performance measure in reinforcement learning as a Gaussian process. These methods were known to reduce the variance and the number of samples needed to obtain accurate gradient estimates in comparison to the conventional Monte-Carlo policy gradient algorithms. In this paper, we propose an improvement over previous Bayesian frameworks for the policy gradient. We use the Hessian matrix distribution as a learning rate schedule to improve the performance of the Bayesian policy gradient algorithm in terms of the variance and the number of samples. As in computing the policy gradient distributions, the Bayesian quadrature method is used to estimate the Hessian matrix distributions. We prove that the posterior mean of the Hessian distribution estimate is symmetric, one of the important properties of the Hessian matrix. Moreover, we prove that with an appropriate choice of kernel, the computational complexity of Hessian distribution estimate is equal to that of the policy gradient distribution estimates. Using simulations, we show encouraging experimental results comparing the proposed algorithm to the Bayesian policy gradient and the Bayesian policy natural gradient algorithms described in Ghavamzadeh and Engel [10]. (C) 2011 Elsevier Inc. All rights reserved.

引用

页码：1671 / 1685

页数：15

共 50 条

[1] On the use of the policy gradient and Hessian in inverse reinforcement learning
Metelli, Alberto Maria
Pirotta, Matteo
Restelli, Marcello
INTELLIGENZA ARTIFICIALE, 2020, 14 (01) : 117 - 150
[2] Policy gradient fuzzy reinforcement learning
Wang, XN
Xu, X
He, HG
PROCEEDINGS OF THE 2004 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2004, : 992 - 995
[3] Hessian Aided Policy Gradient
Shen, Zebang
Hassani, Hamed
Mi, Chao
Qian, Hui
Ribeiro, Alejandro
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
[4] Bayesian sequential optimal experimental design for nonlinear models using policy gradient reinforcement learning
Shen, Wanggang
Huan, Xun
COMPUTER METHODS IN APPLIED MECHANICS AND ENGINEERING, 2023, 416
[5] A modification of gradient policy in reinforcement learning procedure
Abas, Marcel
Skripcak, Tomas
2012 15TH INTERNATIONAL CONFERENCE ON INTERACTIVE COLLABORATIVE LEARNING (ICL), 2012,
[6] Adaptive Natural Policy Gradient in Reinforcement Learning
Li, Dazi
Qiao, Zengyuan
Song, Tianheng
Jin, Qibing
PROCEEDINGS OF 2018 IEEE 7TH DATA DRIVEN CONTROL AND LEARNING SYSTEMS CONFERENCE (DDCLS), 2018, : 605 - 610
[7] Policy Gradient Method For Robust Reinforcement Learning
Wang, Yue
Zou, Shaofeng
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[8] Reinforcement Learning to Rank with Pairwise Policy Gradient
Xu, Jun
Wei, Zeng
Xia, Long
Lan, Yanyan
Yin, Dawei
Cheng, Xueqi
Wen, Ji-Rong
PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 509 - 518
[9] Scalable Multitask Policy Gradient Reinforcement Learning
El Bsat, Salam
Ammar, Haitham Bou
Taylor, Matthew E.
THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 1847 - 1853
[10] A Policy Gradient Algorithm for Learning to Learn in Multiagent Reinforcement Learning
Kim, Dong-Ki
Liu, Miao
Riemer, Matthew
Sun, Chuangchuang
Abdulhai, Marwa
Habibi, Golnaz
Lopez-Cot, Sebastian
Tesauro, Gerald
How, Jonathan P.
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139

← 1 2 3 4 5 →