Improving Optimistic Exploration in Model-Free Reinforcement Learning

被引:0
|
作者
Grzes, Marek [1 ]
Kudenko, Daniel [1 ]
机构
[1] Univ York, Dept Comp Sci, York YO10 5DD, N Yorkshire, England
来源
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The key problem in reinforcement learning is the exploration-exploitation tradeoff. An optimistic initialisation of the value function is a. popular RI strategy. The problem of this approach is that the algorithm may have relatively low performance after many episodes of learning. In this paper, two extensions to standard optimistic exploration are proposed. The first one is based on different initialisation of the value function of goal states. The second one which builds on the previous idea explicitly separates propagation of low and high values in the state space. Proposed extensions show improvement in empirical comparisons with basic optimistic initialisation. Additionally, they improve anytime performance and help on domains where learning takes place on the sub-space of the large state space, that is, where the standard optimistic approach faces more difficulties.
引用
收藏
页码:360 / 369
页数:10
相关论文
共 50 条
  • [11] Recovering Robustness in Model-Free Reinforcement Learning
    Venkataraman, Harish K.
    Seiler, Peter J.
    [J]. 2019 AMERICAN CONTROL CONFERENCE (ACC), 2019, : 4210 - 4216
  • [12] Policy Learning with Constraints in Model-free Reinforcement Learning: A Survey
    Liu, Yongshuai
    Halev, Avishai
    Liu, Xin
    [J]. PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 4508 - 4515
  • [13] Planning Approximate Exploration Trajectories for Model-Free Reinforcement Learning in Contact-Rich Manipulation
    Hoppe, Sabrina
    Lou, Zhongyu
    Hennes, Daniel
    Toussaint, Marc
    [J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2019, 4 (04): : 4042 - 4047
  • [14] ε-BMC: A Bayesian Ensemble Approach to Epsilon-Greedy Exploration in Model-Free Reinforcement Learning
    Gimelfarb, Michael
    Sanner, Scott
    Lee, Chi-Guhn
    [J]. 35TH UNCERTAINTY IN ARTIFICIAL INTELLIGENCE CONFERENCE (UAI 2019), 2020, 115 : 476 - 485
  • [15] Constrained model-free reinforcement learning for process optimization
    Pan, Elton
    Petsagkourakis, Panagiotis
    Mowbray, Max
    Zhang, Dongda
    del Rio-Chanona, Ehecatl Antonio
    [J]. COMPUTERS & CHEMICAL ENGINEERING, 2021, 154
  • [16] Model-Free Preference-Based Reinforcement Learning
    Wirth, Christian
    Fuernkranz, Johannes
    Neumann, Gerhard
    [J]. THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 2222 - 2228
  • [17] Model-Free μ Synthesis via Adversarial Reinforcement Learning
    Keivan, Darioush
    Havens, Aaron
    Seiler, Peter
    Dullerud, Geir
    Hu, Bin
    [J]. 2022 AMERICAN CONTROL CONFERENCE, ACC, 2022, : 3335 - 3341
  • [18] An adaptive clustering method for model-free reinforcement learning
    Matt, A
    Regensburger, G
    [J]. INMIC 2004: 8TH INTERNATIONAL MULTITOPIC CONFERENCE, PROCEEDINGS, 2004, : 362 - 367
  • [19] Model-Free Reinforcement Learning for Mean Field Games
    Mishra, Rajesh
    Vasal, Deepanshu
    Vishwanath, Sriram
    [J]. IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS, 2023, 10 (04): : 2141 - 2151
  • [20] Counterfactual Credit Assignment in Model-Free Reinforcement Learning
    Mesnard, Thomas
    Weber, Theophane
    Viola, Fabio
    Thakoor, Shantanu
    Saade, Alaa
    Harutyunyan, Anna
    Dabney, Will
    Stepleton, Tom
    Heess, Nicolas
    Guez, Arthur
    Moulines, Eric
    Hutter, Marcus
    Buesing, Lars
    Munos, Remi
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139