Dynamic Memory-Based Curiosity: A Bootstrap Approach for Exploration in Reinforcement Learning

被引：1

作者：

Gao, Zijian ^{[1
]}

Li, Yiying ^{[2
]}

Xu, Kele ^{[1
]}

Zhai, Yuanzhao ^{[1
]}

Ding, Bo ^{[1
]}

Feng, Dawei ^{[1
]}

Mao, Xinjun ^{[1
]}

Wang, Huaimin ^{[1
]}

机构：

[1] Natl Univ Def Technol, Sch Comp, Changsha 410000, Peoples R China

[2] Natl Innovat Inst Def Technol, Artificial Intelligence Res Ctr, Beijing 100073, Peoples R China

来源：

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE | 2024年 / 8卷 / 02期

关键词：

Deep reinforcement learning; curiosity; exploration; intrinsic rewards;

D O I：

10.1109/TETCI.2023.3335944

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The sparsity of extrinsic rewards presents a significant challenge for deep reinforcement learning (DRL). As an alternative, researchers have focused on intrinsic rewards to improve exploration efficiency. One of the most representative approaches is utilizing curiosity. However, the challenge of designing effective intrinsic rewards remains, as artificial curiosity differs significantly from human curiosity. In this article, we introduce a novel curiosity approach for DRL, named DyMeCu, which stands for Dynamic Memory-based Curiosity. Inspired by human curiosity and information theory, DyMeCu constructs a dynamic memory using the online learner following the bootstrap paradigm. Additionally, we design a two-learner architecture inspired by ensemble techniques to access curiosity better. The information gap between the two learners serves as the intrinsic reward for agents, and the state information is consistently consolidated into the dynamic memory. Compared with previous curiosity methods, DyMeCu can better mimic human curiosity by using a dynamic memory that can be dynamically grown based on a bootstrap paradigm with two learners. Large-scale empirical experiments on multiple benchmarks, including DeepMind Control Suite and Atari Suite, demonstrate that DyMeCu outperforms competitive curiosity-based methods with or without extrinsic rewards. In the Atari Suite, DyMeCu achieves a mean human-normalized score of 5.076 on a subset of 26 Atari games, achieving a 77.4% relative improvement over the best other baselines. In the DeepMind Control Suite, DyMeCu presents new state-of-the-art results across 11 tasks of all 12 when compared to curiosity-based methods and other pre-training strategies.

引用

页码：1181 / 1193

页数：13

共 50 条

[1] Memory-based reinforcement learning algorithm for autonomous exploration in unknown environment
Dooraki, Amir Ramezani
Lee, Deok Jin
[J]. INTERNATIONAL JOURNAL OF ADVANCED ROBOTIC SYSTEMS, 2018, 15 (03):
[2] Memory-Based Explainable Reinforcement Learning
Cruz, Francisco
Dazeley, Richard
Vamplew, Peter
[J]. AI 2019: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, 11919 : 66 - 77
[3] Hierarchical memory-based reinforcement learning
Hernandez-Gardiol, N
Mahadevan, S
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 13, 2001, 13 : 1047 - 1053
[4] Memory-based Deep Reinforcement Learning for POMDPs
Meng, Lingheng
Gorbet, Rob
Kulic, Dana
[J]. 2021 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2021, : 5619 - 5626
[5] A tabular approach memory-based learning
[J]. Lin, C.-S. (linc@missouri.edu), 1600, Taylor and Francis Inc. (05):
[6] Curiosity-driven Exploration in Reinforcement Learning
Gregor, Michael d
Spalek, Juraj
[J]. 2014 ELEKTRO, 2014, : 435 - 440
[7] Reinforcement Learning Using a Stochastic Gradient Method with Memory-Based Learning
Yamada, Takafumi
Yamaguchi, Satoshi
[J]. ELECTRICAL ENGINEERING IN JAPAN, 2010, 173 (01) : 32 - 40
[8] Study on LSTM and ConvLSTM Memory-Based Deep Reinforcement Learning
Duarte, Fernando Fradique
Lau, Nuno
Pereira, Artur
Reis, Luis Paulo
[J]. AGENTS AND ARTIFICIAL INTELLIGENCE, ICAART 2023, 2024, 14546 : 223 - 243
[9] Self-Attention-Based Temporary Curiosity in Reinforcement Learning Exploration
Hu, Hangkai
Song, Shiji
Huang, Gao
[J]. IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2021, 51 (09): : 5773 - 5784
[10] Predictive Q-routing: A memory-based reinforcement learning approach to adaptive traffic control
Choi, SPM
Yeung, DY
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 8: PROCEEDINGS OF THE 1995 CONFERENCE, 1996, 8 : 945 - 951

← 1 2 3 4 5 →