Variational Bayesian Parameter-Based Policy Exploration

被引:1
|
作者
Hosino, Tikara [1 ]
机构
[1] Nihon Unisys Ltd, Technol Res & Innovat, Koto Ku, 1-1-1 Toyosu, Tokyo, Japan
关键词
Reinforcement Learning; Parameter-Based method; Bayesian Learning; Variational Approximation; Continuous Control; Exploration and Exploitation Trade-Off; GRADIENTS;
D O I
10.1109/ijcnn48605.2020.9207091
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reinforcement learning has shown success in many tasks that cannot provide explicit training samples and can only provide rewards. However, because of a lack of robustness and the need for hard hyperparameter tuning, reinforcement learning is not easily applicable in many new situations. One reason for this problem is that the existing methods do not account for the uncertainties of rewards and policy parameters. In this paper, for parameter-based policy exploration, we use a Bayesian method to define an objective function that explicitly accounts for reward uncertainty. In addition, we provide an algorithm that uses a Bayesian method to optimize this function under the uncertainty of policy parameters in continuous state and action spaces. The results of numerical experiments show that the proposed method is more robust than comparing method against estimation errors on finite samples, because our proposal balances reward acquisition and exploration.
引用
收藏
页数:7
相关论文
共 50 条
  • [31] An Integrated Framework for Parameter-based Optimization of Scientific Workflows
    Kumar, Vijay S.
    Sadayappan, R.
    Mehta, Gaurang
    Vahi, Karan
    Deelman, Ewa
    Ratnakar, Varun
    Kim, Jihie
    Gil, Yolanda
    Hall, Mary
    Kurc, Tahsin
    Saltz, Joel
    HPDC'09: 18TH ACM INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE DISTRIBUTED COMPUTING, 2009, : 177 - 186
  • [32] General parameter-based adaptive extension to FIR filters
    Vainio, O
    Ovaska, SJ
    2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 3765 - 3768
  • [33] Monocular Image Parameter-based Aircraft Sense and Avoid
    Bauer, Peter
    Vanek, Balint
    Peni, Tamas
    Futaki, Anna
    Pencz, Borbala
    Zarandy, Akos
    Bokor, Jozsef
    2015 23RD MEDITERRANEAN CONFERENCE ON CONTROL AND AUTOMATION (MED), 2015, : 630 - 637
  • [34] A parameter-based combined classifier for invariant face recognition
    Tolba, AS
    CYBERNETICS AND SYSTEMS, 2000, 31 (08) : 837 - 849
  • [35] Markov parameter-based input-output identification
    Bingulac, Stanoje
    Al-Muthairi, Naser F.
    Control and Intelligent Systems, 2000, 28 (03) : 91 - 96
  • [36] A lip characteristic parameter-based Identity recognition algorithm
    Meng, Yingjie
    Chen, Wei
    Bai, LiXin
    PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND ELECTRONIC TECHNOLOGY, 2015, 3 : 145 - 149
  • [37] Dynamic Tuning for Parameter-based Virtual Machine Placement
    Mosa, Abdelkhalik
    Sakellariou, Rizos
    2018 17TH INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING (ISPDC), 2018, : 38 - 45
  • [38] Novel parameter-based flexure bearing design method
    Amoedo, Simon
    Thebaud, Edouard
    Gschwendtner, Michael
    White, David
    CRYOGENICS, 2016, 76 : 1 - 9
  • [39] Parameter-Based Analysis of Antepartum and Labour Electrohysterogram Signals
    Pasarica, Alexandru
    Miron, Casian
    Costin, Hariton
    Arotaritei, Dragos
    Rotariu, Cristian
    2017 10TH INTERNATIONAL SYMPOSIUM ON ADVANCED TOPICS IN ELECTRICAL ENGINEERING (ATEE), 2017, : 293 - 296
  • [40] A Literature Review of Parameter-Based Models for Walkability Evaluation
    Dragovic, Danilo
    Krkljes, Milena
    Slavkovic, Branko
    Aleksic, Julija
    Radakovic, Aleksandar
    Zecirovic, Lejla
    Alcan, Melisa
    Hasanbegovic, Enis
    APPLIED SCIENCES-BASEL, 2023, 13 (07):