Hessian matrix distribution for Bayesian policy gradient reinforcement learning

被引:24
|
作者
Ngo Anh Vien [1 ,2 ]
Yu, Hwanjo [1 ]
Chung, TaeChoong [2 ]
机构
[1] Pohang Univ Sci & Technol POSTECH, Dept Comp Sci & Engn, Data Min Lab, Pohang, South Korea
[2] Kyung Hee Univ, Sch Elect & Informat, Dept Comp Engn, Artificial Intelligence Lab, Yongin 446701, Gyeonggi, South Korea
关键词
Markov decision process; Reinforcement learning; Bayesian policy gradient; Monte-Carlo policy gradient; Policy gradient; Hessian matrix distribution;
D O I
10.1016/j.ins.2011.01.001
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Bayesian policy gradient algorithms have been recently proposed for modeling the policy gradient of the performance measure in reinforcement learning as a Gaussian process. These methods were known to reduce the variance and the number of samples needed to obtain accurate gradient estimates in comparison to the conventional Monte-Carlo policy gradient algorithms. In this paper, we propose an improvement over previous Bayesian frameworks for the policy gradient. We use the Hessian matrix distribution as a learning rate schedule to improve the performance of the Bayesian policy gradient algorithm in terms of the variance and the number of samples. As in computing the policy gradient distributions, the Bayesian quadrature method is used to estimate the Hessian matrix distributions. We prove that the posterior mean of the Hessian distribution estimate is symmetric, one of the important properties of the Hessian matrix. Moreover, we prove that with an appropriate choice of kernel, the computational complexity of Hessian distribution estimate is equal to that of the policy gradient distribution estimates. Using simulations, we show encouraging experimental results comparing the proposed algorithm to the Bayesian policy gradient and the Bayesian policy natural gradient algorithms described in Ghavamzadeh and Engel [10]. (C) 2011 Elsevier Inc. All rights reserved.
引用
收藏
页码:1671 / 1685
页数:15
相关论文
共 50 条
  • [41] Direct Policy Search Reinforcement Learning Based on Variational Bayesian Inference
    Yamaguchi, Nobuhiko
    JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2020, 24 (06) : 711 - 718
  • [42] Reinforcement Learning With Adaptive Policy Gradient Transfer Across Heterogeneous Problems
    Zhang, Gengzhi
    Feng, Liang
    Wang, Yu
    Li, Min
    Xie, Hong
    Tan, Kay Chen
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, 8 (03): : 2213 - 2227
  • [43] Policy ensemble gradient for continuous control problems in deep reinforcement learning
    Liu, Guoqiang
    Chen, Gang
    Huang, Victoria
    NEUROCOMPUTING, 2023, 548
  • [44] Risk-Sensitive Reinforcement Learning via Policy Gradient Search
    Prashanth, L. A.
    Fu, Michael C.
    FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2022, 15 (05): : 537 - 693
  • [45] Policy Gradient based Reinforcement Learning Approach for Autonomous Highway Driving
    Aradi, Szilard
    Becsi, Tamas
    Gaspar, Peter
    2018 IEEE CONFERENCE ON CONTROL TECHNOLOGY AND APPLICATIONS (CCTA), 2018, : 670 - 675
  • [46] Fuzzy policy gradient reinforcement learning for leader-follower systems
    Gu, Dongbing
    Yang, Erfu
    2005 IEEE INTERNATIONAL CONFERENCE ON MECHATRONICS AND AUTOMATIONS, VOLS 1-4, CONFERENCE PROCEEDINGS, 2005, : 1557 - 1561
  • [47] Reducing Transmission Delay in EDCA Using Policy Gradient Reinforcement Learning
    Shinzaki, Masao
    Koda, Yusuke
    Yamamoto, Koji
    Nishio, Takayuki
    Morikura, Masahiro
    2020 IEEE 17TH ANNUAL CONSUMER COMMUNICATIONS & NETWORKING CONFERENCE (CCNC 2020), 2020,
  • [48] Policy Gradient Reinforcement Learning for I/O Reordering on Storage Servers
    Dheenadayalan, Kumar
    Srinivasaraghavan, Gopalakrishnan
    Muralidhara, V. N.
    NEURAL INFORMATION PROCESSING, ICONIP 2017, PT I, 2017, 10634 : 849 - 859
  • [49] Traffic Light Control with Policy Gradient-Based Reinforcement Learning
    Tas, Mehmet Bilge Han
    Ozkan, Kemal
    Saricicek, Inci
    Yazici, Ahmet
    32ND IEEE SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU 2024, 2024,
  • [50] Natural policy gradient reinforcement learning for a CPG control of a biped robot
    Nakamura, Y
    Mori, T
    Ishii, S
    PARALLEL PROBLEM SOLVING FROM NATURE - PPSN VIII, 2004, 3242 : 972 - 981