Variational Bayesian Parameter-Based Policy Exploration

被引:1
|
作者
Hosino, Tikara [1 ]
机构
[1] Nihon Unisys Ltd, Technol Res & Innovat, Koto Ku, 1-1-1 Toyosu, Tokyo, Japan
关键词
Reinforcement Learning; Parameter-Based method; Bayesian Learning; Variational Approximation; Continuous Control; Exploration and Exploitation Trade-Off; GRADIENTS;
D O I
10.1109/ijcnn48605.2020.9207091
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reinforcement learning has shown success in many tasks that cannot provide explicit training samples and can only provide rewards. However, because of a lack of robustness and the need for hard hyperparameter tuning, reinforcement learning is not easily applicable in many new situations. One reason for this problem is that the existing methods do not account for the uncertainties of rewards and policy parameters. In this paper, for parameter-based policy exploration, we use a Bayesian method to define an objective function that explicitly accounts for reward uncertainty. In addition, we provide an algorithm that uses a Bayesian method to optimize this function under the uncertainty of policy parameters in continuous state and action spaces. The results of numerical experiments show that the proposed method is more robust than comparing method against estimation errors on finite samples, because our proposal balances reward acquisition and exploration.
引用
收藏
页数:7
相关论文
共 50 条
  • [1] Policy Gradients with Parameter-Based Exploration for Control
    Sehnke, Frank
    Osendorfer, Christian
    Rueckstiess, Thomas
    Graves, Alex
    Peters, Jan
    Schmidhuber, Juergen
    ARTIFICIAL NEURAL NETWORKS - ICANN 2008, PT I, 2008, 5163 : 387 - +
  • [2] Efficient Sample Reuse in Policy Gradients with Parameter-Based Exploration
    Zhao, Tingting
    Hachiya, Hirotaka
    Tangkaratt, Voot
    Morimoto, Jun
    Sugiyama, Masashi
    NEURAL COMPUTATION, 2013, 25 (06) : 1512 - 1547
  • [3] A Policy Gradient with Parameter-based Exploration Approach for Zone-heating
    Van Vaerenbergh, Kevin
    De Hauwere, Yann-Michael
    Depraetere, Bruno
    Van Moffaert, Kristof
    Nowe, Ann
    2015 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI), 2015, : 556 - 563
  • [4] Model-based policy gradients with parameter-based exploration by least-squares conditional density estimation
    Tangkaratt, Voot
    Mod, Syogo
    Zhao, Tingting
    Morimoto, Jun
    Sugiyama, Masashi
    NEURAL NETWORKS, 2014, 57 : 128 - 140
  • [5] PARAMETER-BASED ASYMPTOTICS
    SWEETING, TJ
    BIOMETRIKA, 1992, 79 (02) : 219 - 230
  • [6] Parameter-based reduction of Gaussian mixture models with a variational-Bayes approach
    Bruneau, Pierrick
    Gelgon, Marc
    Picarougne, Fabien
    19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1-6, 2008, : 144 - 147
  • [7] Variational Bayesian Exploration-Based Active Sarsa Algorithm
    Fu, Qiming
    Yang, Zhengxia
    Lu, You
    Wu, Hongjie
    Hu, Fuyuan
    Chen, Jianping
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2019, 33 (10)
  • [8] PARAMETER-BASED MASTER PRODUCTION SCHEDULING
    POHLEN, MF
    TICKNOR, JS
    APICS 32ND INTERNATIONAL CONFERENCE PROCEEDINGS : SOLUTIONS FOR PROGRESS, 1989, : 17 - 21
  • [9] Parameter-based morphometry of the wing of ilium
    Tufegdzic, Milica
    Arsic, Stojanka
    Trajanovic, Miroslav
    JOURNAL OF THE ANATOMICAL SOCIETY OF INDIA, 2015, 64 (02) : 145 - 151
  • [10] PARAMETER ESTIMATION OF GAUSSIAN MIXTURE MODEL BASED ON VARIATIONAL BAYESIAN LEARNING
    Zhao, Linchang
    Shang, Zhaowei
    Qin, Anyong
    Tang, Yuan Yan
    PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOL 1, 2018, : 99 - 104