Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation

被引:0
|
作者
Kulkarni, Tejas D. [1 ,4 ]
Narasimhan, Karthik R. [2 ]
Saeedi, Ardavan [2 ]
Tenenbaum, Joshua B. [3 ]
机构
[1] DeepMind, London, England
[2] MIT, CSAIL, 77 Massachusetts Ave, Cambridge, MA 02139 USA
[3] MIT, BCS, 77 Massachusetts Ave, Cambridge, MA 02139 USA
[4] MIT, 77 Massachusetts Ave, Cambridge, MA 02139 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning goal-directed behavior in environments with sparse feedback is a major challenge for reinforcement learning algorithms. One of the key difficulties is insufficient exploration, resulting in an agent being unable to learn robust policies. Intrinsically motivated agents can explore new behavior for their own sake rather than to directly solve external goals. Such intrinsic behaviors could eventually help the agent solve tasks posed by the environment. We present hierarchical-DQN (h-DQN), a framework to integrate hierarchical action-value functions, operating at different temporal scales, with goal-driven intrinsically motivated deep reinforcement learning. A top-level q-value function learns a policy over intrinsic goals, while a lower-level function learns a policy over atomic actions to satisfy the given goals. h-DQN allows for flexible goal specifications, such as functions over entities and relations. This provides an efficient space for exploration in complicated environments. We demonstrate the strength of our approach on two problems with very sparse and delayed feedback: (1) a complex discrete stochastic decision process with stochastic transitions, and (2) the classic ATARI game - 'Montezuma's Revenge'.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Reinforcement learning based on intrinsic motivation and temporal abstraction via transformation invariance
    Masuyama, Gakuto
    Yamashita, Atsushi
    Asama, Hajime
    [J]. Masuyama, G. (masuyama@robot.t.u-tokyo.ac.jp), 1600, Japan Society of Mechanical Engineers (79): : 289 - 303
  • [2] Language as an Abstraction for Hierarchical Deep Reinforcement Learning
    Jiang, Yiding
    Gu, Shixiang
    Murphy, Kevin
    Finn, Chelsea
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [3] Feature Control as Intrinsic Motivation for Hierarchical Reinforcement Learning
    Dilokthanakul, Nat
    Kaplanis, Christos
    Pawlowski, Nick
    Shanahan, Murray
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (11) : 3409 - 3418
  • [4] Diversity-augmented intrinsic motivation for deep reinforcement learning
    Dai, Tianhong
    Du, Yali
    Fang, Meng
    Bharath, Anil Anthony
    [J]. NEUROCOMPUTING, 2022, 468 : 396 - 406
  • [5] Exploration Approaches in Deep Reinforcement Learning Based on Intrinsic Motivation: A Review
    Zeng J.
    Qin L.
    Xu H.
    Zhang Q.
    Hu Y.
    Yin Q.
    [J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2023, 60 (10): : 2359 - 2382
  • [6] Intrinsic Motivation and Introspection in Reinforcement Learning
    Merrick, Kathryn E.
    [J]. IEEE TRANSACTIONS ON AUTONOMOUS MENTAL DEVELOPMENT, 2012, 4 (04) : 315 - 329
  • [7] Adversarial Intrinsic Motivation for Reinforcement Learning
    Durugkar, Ishan
    Tec, Mauricio
    Niekum, Scott
    Stone, Peter
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [8] State abstraction in MAXQ hierarchical reinforcement learning
    Dietterich, TG
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 12, 2000, 12 : 994 - 1000
  • [9] Hierarchical Coordination Multi-Agent Reinforcement Learning With Spatio-Temporal Abstraction
    Ma, Tinghuai
    Peng, Kexing
    Rong, Huan
    Qian, Yurong
    Al-Nabhan, Najla
    [J]. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, 8 (01): : 533 - 547
  • [10] Time-aware deep reinforcement learning with multi-temporal abstraction
    Kim, Yeo Jin
    Chi, Min
    [J]. APPLIED INTELLIGENCE, 2023, 53 (17) : 20007 - 20033