Variational Bayesian Parameter-Based Policy Exploration

被引:1
|
作者
Hosino, Tikara [1 ]
机构
[1] Nihon Unisys Ltd, Technol Res & Innovat, Koto Ku, 1-1-1 Toyosu, Tokyo, Japan
关键词
Reinforcement Learning; Parameter-Based method; Bayesian Learning; Variational Approximation; Continuous Control; Exploration and Exploitation Trade-Off; GRADIENTS;
D O I
10.1109/ijcnn48605.2020.9207091
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reinforcement learning has shown success in many tasks that cannot provide explicit training samples and can only provide rewards. However, because of a lack of robustness and the need for hard hyperparameter tuning, reinforcement learning is not easily applicable in many new situations. One reason for this problem is that the existing methods do not account for the uncertainties of rewards and policy parameters. In this paper, for parameter-based policy exploration, we use a Bayesian method to define an objective function that explicitly accounts for reward uncertainty. In addition, we provide an algorithm that uses a Bayesian method to optimize this function under the uncertainty of policy parameters in continuous state and action spaces. The results of numerical experiments show that the proposed method is more robust than comparing method against estimation errors on finite samples, because our proposal balances reward acquisition and exploration.
引用
收藏
页数:7
相关论文
共 50 条
  • [41] Scattering Parameter-based Measurement of Planar EMI filter
    Wang, Shishan
    Gong, Min
    Xu, Chenchen
    JOURNAL OF POWER ELECTRONICS, 2014, 14 (04) : 806 - 813
  • [42] Parameter-based representation for modeling complex systems 2
    Friedenthal, SA
    Lykins, H
    IEEE SYMPOSIUM AND WORKSHOP ON ENGINEERING OF COMPUTER-BASED SYSTEMS, PROCEEDINGS, 1996, : 65 - 71
  • [43] Analysis of Parameter-Based Temperature-Controlled Oscillator
    Frankiewicz, Maciej
    Kos, Andrzej
    2014 PROCEEDINGS OF THE 21ST INTERNATIONAL CONFERENCE ON MIXED DESIGN OF INTEGRATED CIRCUITS & SYSTEMS (MIXDES), 2014, : 281 - 284
  • [44] Kernel-based direct policy search reinforcement learning based on variational Bayesian inference
    Yamaguchi, Nobuhiko
    Fukuda, Osamu
    Okumura, Hiroshi
    2019 SEVENTH INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING WORKSHOPS (CANDARW 2019), 2019, : 184 - 187
  • [45] Experimental Design-Build Teaching Parameter-based Design
    Karzel, Ruediger
    Matcha, Heike
    ECAADE 2009: COMPUTATION: THE NEW REALM OF ARCHITECTURAL DESIGN, 2009, : 153 - 158
  • [46] Robot Dominance Expression Through Parameter-based Behaviour Modulation
    Peters, Rifca
    Broekens, Joost
    Li, Kangqi
    Neerincx, Mark A.
    PROCEEDINGS OF THE 19TH ACM INTERNATIONAL CONFERENCE ON INTELLIGENT VIRTUAL AGENTS (IVA' 19), 2019, : 224 - 226
  • [47] A Parameter-based Scheme for Service Composition in Pervasive Computing Environment
    Wang, Zhenghui
    Xu, Tianyin
    Qian, Zhuzhong
    Lu, Sanglu
    CISIS: 2009 INTERNATIONAL CONFERENCE ON COMPLEX, INTELLIGENT AND SOFTWARE INTENSIVE SYSTEMS, VOLS 1 AND 2, 2009, : 543 - 548
  • [48] Clinical parameter-based prediction model for neurosyphilis risk stratification
    Yang, Yilan
    Gu, Xin
    Zhu, Lin
    Cheng, Yuanyuan
    Lu, Haikong
    Guan, Zhifang
    Shi, Mei
    Ni, Liyan
    Peng, Ruirui
    Zhao, Wei
    Wu, Juan
    Qi, Tengfei
    Long, Fuquan
    Chai, Zhe
    Gong, Weiming
    Ye, Meiping
    Zhou, Pingyu
    EPIDEMIOLOGY & INFECTION, 2024, 152
  • [49] Microstructural Parameter-Based Modeling for Transport Properties of Collagen Matrices
    Park, Seungman
    Whittington, Catherine
    Voytik-Harbin, Sherry L.
    Han, Bumsoo
    JOURNAL OF BIOMECHANICAL ENGINEERING-TRANSACTIONS OF THE ASME, 2015, 137 (06):
  • [50] A Survey on Generalized Fuzzy Parameter-Based Fractional Programming Problem
    Verma, Vaishaly
    Singh, Pitam
    NEW MATHEMATICS AND NATURAL COMPUTATION, 2025, 21 (01) : 281 - 321