Online Reinforcement Learning by Bayesian Inference

被引:0
|
作者
Xia, Zhongpu [1 ]
Zhao, Dongbin [1 ]
机构
[1] Chinese Acad Sci, Inst Automat, State Key Lab Management & Control Complex Syst, Beijing 100190, Peoples R China
关键词
GAUSSIAN-PROCESSES;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Policy evaluation has long been one of the core issues of the online reinforcement learning, especially in the continuous state domain. In this paper, the issue is addressed by employing Gaussian processes to represent the action value function from the probability perspective. By modeling the return as a stochastic variable, the action value function can sequentially update according to observed variables such as state and reward by Bayesian inference during the policy evaluation. The update rule shows that it is a temporal difference learning method with the learning rate determined by the uncertainty of a collected sample. Incorporating the policy evaluation method with the E-greedy action selection method, we propose an online reinforcement learning algorithm referred as to Bayesian-SARSA. It is tested on some benchmark problems and the empirical results verifies its effectiveness.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Online reinforcement learning control by Bayesian inference
    Xia, Zhongpu
    Zhao, Dongbin
    [J]. IET CONTROL THEORY AND APPLICATIONS, 2016, 10 (12): : 1331 - 1338
  • [2] Lifelong Incremental Reinforcement Learning With Online Bayesian Inference
    Wang, Zhi
    Chen, Chunlin
    Dong, Daoyi
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (08) : 4003 - 4016
  • [3] Understanding Dialogue Acts by Bayesian Inference and Reinforcement Learning
    Matsushima, Akane
    Oka, Natsuki
    Fukada, Chie
    Tanaka, Kazuaki
    [J]. PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON HUMAN-AGENT INTERACTION (HAI'19), 2019, : 262 - 264
  • [4] Online Bootstrap Inference For Policy Evaluation In Reinforcement Learning
    Ramprasad, Pratik
    Li, Yuantong
    Yang, Zhuoran
    Wang, Zhaoran
    Sun, Will Wei
    Cheng, Guang
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2023, 118 (544) : 2901 - 2914
  • [5] Bayesian Inference and Online Learning in Poisson Neuronal Networks
    Huang, Yanping
    Rao, Rajesh P. N.
    [J]. NEURAL COMPUTATION, 2016, 28 (08) : 1503 - 1526
  • [6] Integrating Distributed Bayesian Inference and Reinforcement Learning for Sensor Management
    Grappiolo, Corrado
    Whiteson, Shimon
    Pavlin, Gregor
    Bakker, Bram
    [J]. FUSION: 2009 12TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION, VOLS 1-4, 2009, : 93 - +
  • [7] Online Bayesian Learning and Inference for OTHR Target Tracking and Registration
    Lan, Hua
    Mao, Yuxiang
    Wang, Zengfu
    [J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2024, 72 : 2983 - 2997
  • [8] Variational Inference MPC for Bayesian Model-based Reinforcement Learning
    Okada, Masashi
    Taniguchi, Tadahiro
    [J]. CONFERENCE ON ROBOT LEARNING, VOL 100, 2019, 100
  • [9] Direct policy search reinforcement learning based on variational Bayesian inference
    Yamaguchi, Nobuhiko
    Ihara, Kazuya
    Fukuda, Osamu
    Okumura, Hiroshi
    [J]. 2018 JOINT 10TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS (SCIS) AND 19TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (ISIS), 2018, : 1009 - 1014
  • [10] Direct Policy Search Reinforcement Learning Based on Variational Bayesian Inference
    Yamaguchi, Nobuhiko
    [J]. JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2020, 24 (06) : 711 - 718