Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation

被引:0
|
作者
Kulkarni, Tejas D. [1 ,4 ]
Narasimhan, Karthik R. [2 ]
Saeedi, Ardavan [2 ]
Tenenbaum, Joshua B. [3 ]
机构
[1] DeepMind, London, England
[2] MIT, CSAIL, 77 Massachusetts Ave, Cambridge, MA 02139 USA
[3] MIT, BCS, 77 Massachusetts Ave, Cambridge, MA 02139 USA
[4] MIT, 77 Massachusetts Ave, Cambridge, MA 02139 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning goal-directed behavior in environments with sparse feedback is a major challenge for reinforcement learning algorithms. One of the key difficulties is insufficient exploration, resulting in an agent being unable to learn robust policies. Intrinsically motivated agents can explore new behavior for their own sake rather than to directly solve external goals. Such intrinsic behaviors could eventually help the agent solve tasks posed by the environment. We present hierarchical-DQN (h-DQN), a framework to integrate hierarchical action-value functions, operating at different temporal scales, with goal-driven intrinsically motivated deep reinforcement learning. A top-level q-value function learns a policy over intrinsic goals, while a lower-level function learns a policy over atomic actions to satisfy the given goals. h-DQN allows for flexible goal specifications, such as functions over entities and relations. This provides an efficient space for exploration in complicated environments. We demonstrate the strength of our approach on two problems with very sparse and delayed feedback: (1) a complex discrete stochastic decision process with stochastic transitions, and (2) the classic ATARI game - 'Montezuma's Revenge'.
引用
收藏
页数:9
相关论文
共 50 条
  • [21] Deep Reinforcement Learning with Hierarchical Structures
    Li, Siyuan
    PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 4899 - 4900
  • [22] Learning Cooperative Intrinsic Motivation in Multi-Agent Reinforcement Learning
    Hong, Seung-Jin
    Lee, Sang-Kwang
    12TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE (ICTC 2021): BEYOND THE PANDEMIC ERA WITH ICT CONVERGENCE INNOVATION, 2021, : 1697 - 1699
  • [23] Deep Reinforcement Learning with Temporal Logics
    Hasanbeig, Mohammadhosein
    Kroening, Daniel
    Abate, Alessandro
    FORMAL MODELING AND ANALYSIS OF TIMED SYSTEMS, FORMATS 2020, 2020, 12288 : 1 - 22
  • [24] Emotion-Based Intrinsic Motivation for Reinforcement Learning Agents
    Sequeira, Pedro
    Melo, Francisco S.
    Paiva, Ana
    AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION, PT I, 2011, 6974 : 326 - 336
  • [25] Using Emotions as Intrinsic Motivation to Accelerate Classic Reinforcement Learning
    Lu, Cheng-Xiang
    Sun, Zhi-Yuan
    Shi, Zhong-Zhi
    Cao, Bao-Xiang
    2016 INTERNATIONAL CONFERENCE ON INFORMATION SYSTEM AND ARTIFICIAL INTELLIGENCE (ISAI 2016), 2016, : 332 - 337
  • [26] Research and Development on Deep Hierarchical Reinforcement Learning
    Huang Z.-G.
    Liu Q.
    Zhang L.-H.
    Cao J.-Q.
    Zhu F.
    Ruan Jian Xue Bao/Journal of Software, 2023, 34 (02): : 733 - 760
  • [27] HIERARCHICAL CACHING VIA DEEP REINFORCEMENT LEARNING
    Sadeghi, Alireza
    Wang, Gang
    Giannakis, Georgios B.
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3532 - 3536
  • [28] Unsupervised Hierarchical Temporal Abstraction by Simultaneously Learning Expectations and Representations
    Metcalf, Katherine
    Leake, David
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 3144 - 3150
  • [29] Safe state abstraction and reusable continuing subtasks in hierarchical reinforcement learning
    Hengst, Bernhard
    AI 2007: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2007, 4830 : 58 - 67
  • [30] Intrinsic Motivation Based Hierarchical Exploration for Model and Skill Learning
    Lu, Lina
    Zhang, Wanpeng
    Gu, Xueqiang
    Chen, Jing
    ELECTRONICS, 2020, 9 (02)