Improving Optimistic Exploration in Model-Free Reinforcement Learning

被引:0
|
作者
Grzes, Marek [1 ]
Kudenko, Daniel [1 ]
机构
[1] Univ York, Dept Comp Sci, York YO10 5DD, N Yorkshire, England
来源
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The key problem in reinforcement learning is the exploration-exploitation tradeoff. An optimistic initialisation of the value function is a. popular RI strategy. The problem of this approach is that the algorithm may have relatively low performance after many episodes of learning. In this paper, two extensions to standard optimistic exploration are proposed. The first one is based on different initialisation of the value function of goal states. The second one which builds on the previous idea explicitly separates propagation of low and high values in the state space. Proposed extensions show improvement in empirical comparisons with basic optimistic initialisation. Additionally, they improve anytime performance and help on domains where learning takes place on the sub-space of the large state space, that is, where the standard optimistic approach faces more difficulties.
引用
收藏
页码:360 / 369
页数:10
相关论文
共 50 条
  • [41] Model-free reinforcement learning from expert demonstrations: a survey
    Jorge Ramírez
    Wen Yu
    Adolfo Perrusquía
    [J]. Artificial Intelligence Review, 2022, 55 : 3213 - 3241
  • [42] Model-Free Deep Inverse Reinforcement Learning by Logistic Regression
    Eiji Uchibe
    [J]. Neural Processing Letters, 2018, 47 : 891 - 905
  • [43] Model-free reinforcement learning from expert demonstrations: a survey
    Ramirez, Jorge
    Yu, Wen
    Perrusquia, Adolfo
    [J]. ARTIFICIAL INTELLIGENCE REVIEW, 2022, 55 (04) : 3213 - 3241
  • [44] Mastering the game of Stratego with model-free multiagent reinforcement learning
    Perolat, Julien
    De Vylder, Bart
    Hennes, Daniel
    Tarassov, Eugene
    Strub, Florian
    de Boer, Vincent
    Muller, Paul
    Connor, Jerome T.
    Burch, Neil
    Anthony, Thomas
    McAleer, Stephen
    Elie, Romuald
    Cen, Sarah H.
    Wang, Zhe
    Gruslys, Audrunas
    Malysheva, Aleksandra
    Khan, Mina
    Ozair, Sherjil
    Timbers, Finbarr
    Pohlen, Toby
    Eccles, Tom
    Rowland, Mark
    Lanctot, Marc
    Lespiau, Jean-Baptiste
    Piot, Bilal
    Omidshafiei, Shayegan
    Lockhart, Edward
    Sifre, Laurent
    Beauguerlange, Nathalie
    Munos, Remi
    Silver, David
    Singh, Satinder
    Hassabis, Demis
    Tuyls, Karl
    [J]. SCIENCE, 2022, 378 (6623) : 990 - +
  • [45] Model-free robust reinforcement learning via Polynomial Chaos
    Liu, Jianxiang
    Wu, Faguo
    Zhang, Xiao
    [J]. Knowledge-Based Systems, 2025, 309
  • [46] Omega-Regular Objectives in Model-Free Reinforcement Learning
    Hahn, Ernst Moritz
    Perez, Mateo
    Schewe, Sven
    Somenzi, Fabio
    Trivedi, Ashutosh
    Wojtczak, Dominik
    [J]. TOOLS AND ALGORITHMS FOR THE CONSTRUCTION AND ANALYSIS OF SYSTEMS, PT I, 2019, 11427 : 395 - 412
  • [48] Model-free Control for Stratospheric Airship Based on Reinforcement Learning
    Nie, Chunyu
    Zhu, Ming
    Zheng, Zewei
    Wu, Zhe
    [J]. PROCEEDINGS OF THE 35TH CHINESE CONTROL CONFERENCE 2016, 2016, : 10702 - 10707
  • [49] Model-free Deep Reinforcement Learning for Urban Autonomous Driving
    Chen, Jianyu
    Yuan, Bodi
    Tomizuka, Masayoshi
    [J]. 2019 IEEE INTELLIGENT TRANSPORTATION SYSTEMS CONFERENCE (ITSC), 2019, : 2765 - 2771
  • [50] An Hybrid Model-Free Reinforcement Learning Approach for HVAC Control
    Solinas, Francesco M.
    Bellagarda, Andrea
    Macii, Enrico
    Patti, Edoardo
    Bottaccioli, Lorenzo
    [J]. 2021 21ST IEEE INTERNATIONAL CONFERENCE ON ENVIRONMENT AND ELECTRICAL ENGINEERING AND 2021 5TH IEEE INDUSTRIAL AND COMMERCIAL POWER SYSTEMS EUROPE (EEEIC/I&CPS EUROPE), 2021,