A Meta-MDP Approach to Exploration for Lifelong Reinforcement Learning

被引:0
|
作者
Garcia, Francisco M. [1 ]
Thomas, Philip S. [1 ]
机构
[1] Univ Massachusetts, Amherst, MA 01003 USA
关键词
Reinforcement Learning; Hierarchical RL; Exploration;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this paper we consider the problem of how a reinforcement learning agent that is tasked with solving a sequence of reinforcement learning problems (Markov decision processes) can use knowledge acquired early in its lifetime to improve its ability to solve new problems. Specifically, we focus on the question of how the agent should explore when faced with a new environment. We show that the search for an optimal exploration strategy can be formulated as a reinforcement learning problem itself, albeit with a different timescale. We conclude with experiments that show the benefits of optimizing an exploration strategy using our proposed approach.
引用
收藏
页码:1976 / 1978
页数:3
相关论文
共 50 条
  • [31] Policy and Value Transfer in Lifelong Reinforcement Learning
    Abel, David
    Jinnai, Yuu
    Guo, Yue
    Konidaris, George
    Littman, Michael L.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [32] LIFELONG ROBOTIC REINFORCEMENT LEARNING BY RETAINING EXPERIENCES
    Xie, Annie
    Finn, Chelsea
    CONFERENCE ON LIFELONG LEARNING AGENTS, VOL 199, 2022, 199
  • [33] MDP Playground: An Analysis and Debug Testbed for Reinforcement Learning
    Rajan, Raghu
    Diaz, Jessica Lizeth Borja
    Guttikonda, Suresh
    Ferreira, Fabio
    Biedenkapp, Andre
    von Hartz, Jan Ole
    Hutter, Frank
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2023, 77 : 821 - 890
  • [34] MDP Playground: An Analysis and Debug Testbed for Reinforcement Learning
    Rajan R.
    Diaz J.L.B.
    Guttikonda S.
    Ferreira F.
    Biedenkapp A.
    Von Hartz J.O.
    Hutter F.
    Journal of Artificial Intelligence Research, 2023, 77 : 821 - 890
  • [35] MDP Homomorphic Networks: Group Symmetries in Reinforcement Learning
    van der Pol, Elise
    Worrall, Daniel E.
    van Hoof, Herke
    Oliehoek, Frans A.
    Welling, Max
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [36] Generalized Inverse Reinforcement Learning with Linearly Solvable MDP
    Kohjima, Masahiro
    Matsubayashi, Tatsushi
    Sawada, Hiroshi
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2017, PT II, 2017, 10535 : 373 - 388
  • [37] Deep reinforcement learning for layout planning - An MDP-based approach for the facility layout problem
    Heinbach, Benjamin
    Burggraef, Peter
    Wagner, Johannes
    MANUFACTURING LETTERS, 2023, 38 : 40 - 43
  • [38] Exploration Entropy for Reinforcement Learning
    Xin, Bo
    Yu, Haixu
    Qin, You
    Tang, Qing
    Zhu, Zhangqing
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2020, 2020
  • [39] Exploration in Structured Reinforcement Learning
    Ok, Jungseul
    Proutiere, Alexandre
    Tranos, Damianos
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [40] Exploration and Incentives in Reinforcement Learning
    Simchowitz, Max
    Slivkins, Aleksandrs
    OPERATIONS RESEARCH, 2024, 72 (03) : 983 - 998