Online reinforcement learning control by Bayesian inference

被引:5
|
作者
Xia, Zhongpu [1 ]
Zhao, Dongbin [1 ]
机构
[1] Chinese Acad Sci, Inst Automat, State Key Lab Management & Control Complex Syst, Beijing 100190, Peoples R China
来源
IET CONTROL THEORY AND APPLICATIONS | 2016年 / 10卷 / 12期
基金
中国国家自然科学基金;
关键词
learning systems; Bayes methods; Gaussian processes; optimal control; online reinforcement learning control; Bayesian inference; self-learning control; probability; action value function; Gaussian process; Bayesian-state-action-reward-state-action algorithm; AFFINE NONLINEAR-SYSTEMS; FEEDBACK-CONTROL; TIME-SYSTEMS; ALGORITHM; ITERATION;
D O I
10.1049/iet-cta.2015.0669
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Reinforcement learning offers a promising way for self-learning control of an unknown system, but it involves the issues of policy evaluation and exploration, especially in the domain of continuous state. In this study, these issues are addressed from the perspective of probability. It models the action value function as the latent variable of Gaussian process, while the reward as the observed variable. Then an online approach is proposed to update the action value function by Bayesian inference. Taking an advantage of the proposed framework, a prior knowledge can be incorporated into the action value function, and thus an efficient exploration strategy is presented. At last, the Bayesian-state-action-reward-state-action algorithm is tested on some benchmark problems and empirical results show its effectiveness.
引用
收藏
页码:1331 / 1338
页数:8
相关论文
共 50 条
  • [1] Online Reinforcement Learning by Bayesian Inference
    Xia, Zhongpu
    Zhao, Dongbin
    [J]. 2015 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2015,
  • [2] Lifelong Incremental Reinforcement Learning With Online Bayesian Inference
    Wang, Zhi
    Chen, Chunlin
    Dong, Daoyi
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (08) : 4003 - 4016
  • [3] Understanding Dialogue Acts by Bayesian Inference and Reinforcement Learning
    Matsushima, Akane
    Oka, Natsuki
    Fukada, Chie
    Tanaka, Kazuaki
    [J]. PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON HUMAN-AGENT INTERACTION (HAI'19), 2019, : 262 - 264
  • [4] Online Bootstrap Inference For Policy Evaluation In Reinforcement Learning
    Ramprasad, Pratik
    Li, Yuantong
    Yang, Zhuoran
    Wang, Zhaoran
    Sun, Will Wei
    Cheng, Guang
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2023, 118 (544) : 2901 - 2914
  • [5] Bayesian Inference and Online Learning in Poisson Neuronal Networks
    Huang, Yanping
    Rao, Rajesh P. N.
    [J]. NEURAL COMPUTATION, 2016, 28 (08) : 1503 - 1526
  • [6] Integrating Distributed Bayesian Inference and Reinforcement Learning for Sensor Management
    Grappiolo, Corrado
    Whiteson, Shimon
    Pavlin, Gregor
    Bakker, Bram
    [J]. FUSION: 2009 12TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION, VOLS 1-4, 2009, : 93 - +
  • [7] Dual Control for Approximate Bayesian Reinforcement Learning
    Klenske, Edgar D.
    Hennig, Philipp
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2016, 17
  • [8] Online Bayesian Learning and Inference for OTHR Target Tracking and Registration
    Lan, Hua
    Mao, Yuxiang
    Wang, Zengfu
    [J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2024, 72 : 2983 - 2997
  • [9] Variational Inference MPC for Bayesian Model-based Reinforcement Learning
    Okada, Masashi
    Taniguchi, Tadahiro
    [J]. CONFERENCE ON ROBOT LEARNING, VOL 100, 2019, 100
  • [10] Direct policy search reinforcement learning based on variational Bayesian inference
    Yamaguchi, Nobuhiko
    Ihara, Kazuya
    Fukuda, Osamu
    Okumura, Hiroshi
    [J]. 2018 JOINT 10TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS (SCIS) AND 19TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (ISIS), 2018, : 1009 - 1014