Hessian matrix distribution for Bayesian policy gradient reinforcement learning

被引:23
|
作者
Ngo Anh Vien [1 ,2 ]
Yu, Hwanjo [1 ]
Chung, TaeChoong [2 ]
机构
[1] Pohang Univ Sci & Technol POSTECH, Dept Comp Sci & Engn, Data Min Lab, Pohang, South Korea
[2] Kyung Hee Univ, Sch Elect & Informat, Dept Comp Engn, Artificial Intelligence Lab, Yongin 446701, Gyeonggi, South Korea
关键词
Markov decision process; Reinforcement learning; Bayesian policy gradient; Monte-Carlo policy gradient; Policy gradient; Hessian matrix distribution;
D O I
10.1016/j.ins.2011.01.001
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Bayesian policy gradient algorithms have been recently proposed for modeling the policy gradient of the performance measure in reinforcement learning as a Gaussian process. These methods were known to reduce the variance and the number of samples needed to obtain accurate gradient estimates in comparison to the conventional Monte-Carlo policy gradient algorithms. In this paper, we propose an improvement over previous Bayesian frameworks for the policy gradient. We use the Hessian matrix distribution as a learning rate schedule to improve the performance of the Bayesian policy gradient algorithm in terms of the variance and the number of samples. As in computing the policy gradient distributions, the Bayesian quadrature method is used to estimate the Hessian matrix distributions. We prove that the posterior mean of the Hessian distribution estimate is symmetric, one of the important properties of the Hessian matrix. Moreover, we prove that with an appropriate choice of kernel, the computational complexity of Hessian distribution estimate is equal to that of the policy gradient distribution estimates. Using simulations, we show encouraging experimental results comparing the proposed algorithm to the Bayesian policy gradient and the Bayesian policy natural gradient algorithms described in Ghavamzadeh and Engel [10]. (C) 2011 Elsevier Inc. All rights reserved.
引用
收藏
页码:1671 / 1685
页数:15
相关论文
共 50 条
  • [31] An efficient and robust gradient reinforcement learning: Deep comparative policy
    Wang, Jiaguo
    Li, Wenheng
    Lei, Chao
    Yang, Meng
    Pei, Yang
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2024, 46 (02) : 3773 - 3788
  • [32] Variance Reduced Domain Randomization for Reinforcement Learning With Policy Gradient
    Jiang, Yuankun
    Li, Chenglin
    Dai, Wenrui
    Zou, Junni
    Xiong, Hongkai
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (02) : 1031 - 1048
  • [33] A policy gradient reinforcement learning algorithm with fuzzy function approximation
    Gu, DB
    Yang, EF
    IEEE ROBIO 2004: Proceedings of the IEEE International Conference on Robotics and Biomimetics, 2004, : 936 - 940
  • [34] An Approach to Policy Gradient Reinforcement Learning with Multiple Evaluation Metrics
    Yasutake, Yoshihiro
    Tagawa, Chihiro
    Sawada, Sunao
    2019 34TH INTERNATIONAL TECHNICAL CONFERENCE ON CIRCUITS/SYSTEMS, COMPUTERS AND COMMUNICATIONS (ITC-CSCC 2019), 2019, : 449 - 452
  • [35] Molecule generation using transformers and policy gradient reinforcement learning
    Eyal Mazuz
    Guy Shtar
    Bracha Shapira
    Lior Rokach
    Scientific Reports, 13
  • [36] Variational Policy Gradient Method for Reinforcement Learning with General Utilities
    Zhang, Junyu
    Koppel, Alec
    Bedi, Amrit Singh
    Szepesvari, Csaba
    Wang, Mengdi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [37] Estimating the Hessian Matrix of Ranking Objectives for Stochastic Learning to Rank with Gradient Boosted Trees
    Kang, Jingwei
    de Rijke, Maarten
    Oosterhuis, Harrie
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 2390 - 2394
  • [38] Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning Shixiang
    Gu, Shixiang
    Lillicrap, Timothy
    Ghahramani, Zoubin
    Turner, Richard E.
    Scholkopf, Bernhard
    Levine, Sergey
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [39] Direct policy search reinforcement learning based on variational Bayesian inference
    Yamaguchi, Nobuhiko
    Ihara, Kazuya
    Fukuda, Osamu
    Okumura, Hiroshi
    2018 JOINT 10TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS (SCIS) AND 19TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (ISIS), 2018, : 1009 - 1014
  • [40] Direct Policy Search Reinforcement Learning Based on Variational Bayesian Inference
    Yamaguchi, Nobuhiko
    JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2020, 24 (06) : 711 - 718