Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation

被引：0

作者：

Kulkarni, Tejas D. ^{[1
,4
]}

Narasimhan, Karthik R. ^{[2
]}

Saeedi, Ardavan ^{[2
]}

Tenenbaum, Joshua B. ^{[3
]}

机构：

[1] DeepMind, London, England

[2] MIT, CSAIL, 77 Massachusetts Ave, Cambridge, MA 02139 USA

[3] MIT, BCS, 77 Massachusetts Ave, Cambridge, MA 02139 USA

[4] MIT, 77 Massachusetts Ave, Cambridge, MA 02139 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016) | 2016年 / 29卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Learning goal-directed behavior in environments with sparse feedback is a major challenge for reinforcement learning algorithms. One of the key difficulties is insufficient exploration, resulting in an agent being unable to learn robust policies. Intrinsically motivated agents can explore new behavior for their own sake rather than to directly solve external goals. Such intrinsic behaviors could eventually help the agent solve tasks posed by the environment. We present hierarchical-DQN (h-DQN), a framework to integrate hierarchical action-value functions, operating at different temporal scales, with goal-driven intrinsically motivated deep reinforcement learning. A top-level q-value function learns a policy over intrinsic goals, while a lower-level function learns a policy over atomic actions to satisfy the given goals. h-DQN allows for flexible goal specifications, such as functions over entities and relations. This provides an efficient space for exploration in complicated environments. We demonstrate the strength of our approach on two problems with very sparse and delayed feedback: (1) a complex discrete stochastic decision process with stochastic transitions, and (2) the classic ATARI game - 'Montezuma's Revenge'.

引用

页数：9

共 50 条

[1] Reinforcement learning based on intrinsic motivation and temporal abstraction via transformation invariance
Masuyama, Gakuto
Yamashita, Atsushi
Asama, Hajime
[J]. Masuyama, G. (masuyama@robot.t.u-tokyo.ac.jp), 1600, Japan Society of Mechanical Engineers (79): : 289 - 303
[2] Language as an Abstraction for Hierarchical Deep Reinforcement Learning
Jiang, Yiding
Gu, Shixiang
Murphy, Kevin
Finn, Chelsea
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[3] Feature Control as Intrinsic Motivation for Hierarchical Reinforcement Learning
Dilokthanakul, Nat
Kaplanis, Christos
Pawlowski, Nick
Shanahan, Murray
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (11) : 3409 - 3418
[4] Diversity-augmented intrinsic motivation for deep reinforcement learning
Dai, Tianhong
Du, Yali
Fang, Meng
Bharath, Anil Anthony
[J]. NEUROCOMPUTING, 2022, 468 : 396 - 406
[5] Exploration Approaches in Deep Reinforcement Learning Based on Intrinsic Motivation: A Review
Zeng J.
Qin L.
Xu H.
Zhang Q.
Hu Y.
Yin Q.
[J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2023, 60 (10): : 2359 - 2382
[6] Intrinsic Motivation and Introspection in Reinforcement Learning
Merrick, Kathryn E.
[J]. IEEE TRANSACTIONS ON AUTONOMOUS MENTAL DEVELOPMENT, 2012, 4 (04) : 315 - 329
[7] Adversarial Intrinsic Motivation for Reinforcement Learning
Durugkar, Ishan
Tec, Mauricio
Niekum, Scott
Stone, Peter
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[8] State abstraction in MAXQ hierarchical reinforcement learning
Dietterich, TG
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 12, 2000, 12 : 994 - 1000
[9] Hierarchical Coordination Multi-Agent Reinforcement Learning With Spatio-Temporal Abstraction
Ma, Tinghuai
Peng, Kexing
Rong, Huan
Qian, Yurong
Al-Nabhan, Najla
[J]. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, 8 (01): : 533 - 547
[10] Time-aware deep reinforcement learning with multi-temporal abstraction
Kim, Yeo Jin
Chi, Min
[J]. APPLIED INTELLIGENCE, 2023, 53 (17) : 20007 - 20033

← 1 2 3 4 5 →