Bayesian Reinforcement Learning with Exploration

被引:0
|
作者
Lattimore, Tor [1 ]
Hutter, Marcus [2 ]
机构
[1] Univ Alberta, Edmonton, AB T6G 2M7, Canada
[2] Australian Natl Univ, Canberra, ACT 0200, Australia
来源
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider a general reinforcement learning problem and show that carefully combining the Bayesian optimal policy and an exploring policy leads to minimax sample-complexity bounds in a very general class of (history-based) environments. We also prove lower bounds and show that the new algorithm displays adaptive behaviour when the environment is easier than worst-case.
引用
收藏
页码:170 / 184
页数:15
相关论文
共 50 条
  • [41] BSE-MAML: Model Agnostic Meta-Reinforcement Learning via Bayesian Structured Exploration
    Wang, Haonan
    Zhang, Yiyun
    Feng, Dawei
    Li, Dongsheng
    Huang, Feng
    2020 IEEE 13TH INTERNATIONAL CONFERENCE ON SERVICES COMPUTING (SCC 2020), 2020, : 60 - 67
  • [42] ε-BMC: A Bayesian Ensemble Approach to Epsilon-Greedy Exploration in Model-Free Reinforcement Learning
    Gimelfarb, Michael
    Sanner, Scott
    Lee, Chi-Guhn
    35TH UNCERTAINTY IN ARTIFICIAL INTELLIGENCE CONFERENCE (UAI 2019), 2020, 115 : 476 - 485
  • [43] Fast active learning for pure exploration in reinforcement learning
    Menard, Pierre
    Domingues, Omar Darwiche
    Kaufmann, Emilie
    Jonsson, Anders
    Leurent, Edouard
    Valko, Michal
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [44] Learning of deterministic exploration and temporal abstraction in reinforcement learning
    Shibata, Katsunari
    2006 SICE-ICASE International Joint Conference, Vols 1-13, 2006, : 2212 - 2217
  • [45] Shaping Bayesian Network Based Reinforcement Learning
    Song, Jiong
    Jin, Zhao
    2012 INTERNATIONAL CONFERENCE ON INDUSTRIAL CONTROL AND ELECTRONICS ENGINEERING (ICICEE), 2012, : 742 - 745
  • [46] Robust Reinforcement Learning with Bayesian Optimisation and Quadrature
    Paul, Supratik
    Chatzilygeroudis, Konstantinos
    Ciosek, Kamil
    Mouret, Jean-Baptiste
    Osborne, Michael A.
    Whiteson, Shimon
    JOURNAL OF MACHINE LEARNING RESEARCH, 2020, 21
  • [47] Approximate planning for bayesian hierarchical reinforcement learning
    Ngo Anh Vien
    Hung Ngo
    Lee, Sungyoung
    Chung, TaeChoong
    APPLIED INTELLIGENCE, 2014, 41 (03) : 808 - 819
  • [48] A Bayesian Posterior Updating Algorithm in Reinforcement Learning
    Xiong, Fangzhou
    Liu, Zhiyong
    Yang, Xu
    Sun, Biao
    Chiu, Charles
    Qiao, Hong
    NEURAL INFORMATION PROCESSING, ICONIP 2017, PT V, 2017, 10638 : 418 - 426
  • [49] Surveillance Evasion Through Bayesian Reinforcement Learning
    Cornell University, United States
    arXiv, 1600,
  • [50] Robust reinforcement learning with bayesian optimisation and quadrature
    Paul, Supratik
    Chatzilygeroudis, Konstantinos
    Ciosek, Kamil
    Mouret, Jean-Baptiste
    Osborne, Michael A.
    Whiteson, Shimon
    Journal of Machine Learning Research, 2020, 21