Temporal Abstraction in Reinforcement Learning with the Successor Representation

被引:0
|
作者
Machado, Marlos C. [1 ]
Barreto, Andre [2 ]
Precup, Doina [3 ]
Bowling, Michael [1 ]
机构
[1] Univ Alberta, Alberta Machine Intelligence Inst Amii, Dept Comp Sci, DeepMind, Edmonton, AB, Canada
[2] DeepMind, London, England
[3] McGill Univ, Quebec AI Inst Mila, Sch Comp Sci, DeepMind, Montreal, PQ, Canada
关键词
Reinforcement learning; Options; Successor representation; Eigenoptions; Covering options; Option keyboard; Temporally-extended exploration; SLOW FEATURE ANALYSIS; EXPLORATION; FRAMEWORK; LEVEL; MDPS;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Reasoning at multiple levels of temporal abstraction is one of the key attributes of intelligence. In reinforcement learning, this is often modeled through temporally extended courses of actions called options. Options allow agents to make predictions and to operate at different levels of abstraction within an environment. Nevertheless, approaches based on the options framework often start with the assumption that a reasonable set of options is known beforehand. When this is not the case, there are no definitive answers for which op-tions one should consider. In this paper, we argue that the successor representation, which encodes states based on the pattern of state visitation that follows them, can be seen as a natural substrate for the discovery and use of temporal abstractions. To support our claim, we take a big picture view of recent results, showing how the successor representation can be used to discover options that facilitate either temp orally-extended exploration or planning. We cast these results as instantiations of a general framework for option discovery in which the agent's representation is used to identify useful options, which are then used to further improve its representation. This results in a virtuous, never-ending, cycle in which both the representation and the options are constantly refined based on each other. Beyond option discovery itself, we also discuss how the successor representation allows us to augment a set of options into a combinatorially large counterpart without additional learning. This is achieved through the combination of previously learned options. Our empirical evaluation focuses on options discovered for temp orally-extended exploration and on the use of the successor representation to combine them. Our results shed light on important design deci-sions involved in the definition of options and demonstrate the synergy of different methods based on the successor representation, such as eigenoptions and the option keyboard.
引用
收藏
页数:69
相关论文
共 50 条
  • [21] Hierarchical Coordination Multi-Agent Reinforcement Learning With Spatio-Temporal Abstraction
    Ma, Tinghuai
    Peng, Kexing
    Rong, Huan
    Qian, Yurong
    Al-Nabhan, Najla
    [J]. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, 8 (01): : 533 - 547
  • [22] EFFICIENT ABSTRACTION SELECTION IN REINFORCEMENT LEARNING
    van Seijen, Harm
    Whiteson, Shimon
    Kester, Leon
    [J]. COMPUTATIONAL INTELLIGENCE, 2014, 30 (04) : 657 - 699
  • [23] Regularizing Reinforcement Learning with State Abstraction
    Akrour, Riad
    Veiga, Filipe
    Peters, Jan
    Neumann, Gerhard
    [J]. 2018 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2018, : 534 - 539
  • [24] Successor Options: An Option Discovery Framework for Reinforcement Learning
    Ramesh, Rahul
    Tomar, Manan
    Ravindran, Balaraman
    [J]. PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 3304 - 3310
  • [25] A Theory of State Abstraction for Reinforcement Learning
    Abel, David
    [J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 9876 - 9877
  • [26] Uniform State Abstraction for Reinforcement Learning
    Burden, John
    Kudenko, Daniel
    [J]. ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 1031 - 1038
  • [27] Structural abstraction experiments in reinforcement learning
    Fitch, R
    Hengst, B
    Suc, D
    Calbert, G
    Scholz, J
    [J]. AI 2005: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2005, 3809 : 164 - 175
  • [28] Multiple Frequency Bands Temporal State Representation for Deep Reinforcement Learning
    Wang, Che
    Hu, Jifeng
    Song, Fuhu
    Huang, Jiao
    Yang, Zixuan
    Wang, Yusen
    [J]. 2023 2ND ASIA CONFERENCE ON ALGORITHMS, COMPUTING AND MACHINE LEARNING, CACML 2023, 2023, : 309 - 315
  • [29] Accelerating Learning in Constructive Predictive Frameworks with the Successor Representation
    Sherstan, Craig
    Machado, Marlos C.
    Pilarski, Patrick M.
    [J]. 2018 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2018, : 2997 - 3003
  • [30] A Probabilistic Successor Representation for Context-Dependent Learning
    Geerts, Jesse P. P.
    Gershman, Samuel J. J.
    Burgess, Neil
    Stachenfeld, Kimberly L. L.
    [J]. PSYCHOLOGICAL REVIEW, 2023, : 578 - 597