Dynamic Memory-Based Curiosity: A Bootstrap Approach for Exploration in Reinforcement Learning

被引：1

作者：

Gao, Zijian ^{[1
]}

Li, Yiying ^{[2
]}

Xu, Kele ^{[1
]}

Zhai, Yuanzhao ^{[1
]}

Ding, Bo ^{[1
]}

Feng, Dawei ^{[1
]}

Mao, Xinjun ^{[1
]}

Wang, Huaimin ^{[1
]}

机构：

[1] Natl Univ Def Technol, Sch Comp, Changsha 410000, Peoples R China

[2] Natl Innovat Inst Def Technol, Artificial Intelligence Res Ctr, Beijing 100073, Peoples R China

来源：

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE | 2024年 / 8卷 / 02期

关键词：

Deep reinforcement learning; curiosity; exploration; intrinsic rewards;

D O I：

10.1109/TETCI.2023.3335944

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The sparsity of extrinsic rewards presents a significant challenge for deep reinforcement learning (DRL). As an alternative, researchers have focused on intrinsic rewards to improve exploration efficiency. One of the most representative approaches is utilizing curiosity. However, the challenge of designing effective intrinsic rewards remains, as artificial curiosity differs significantly from human curiosity. In this article, we introduce a novel curiosity approach for DRL, named DyMeCu, which stands for Dynamic Memory-based Curiosity. Inspired by human curiosity and information theory, DyMeCu constructs a dynamic memory using the online learner following the bootstrap paradigm. Additionally, we design a two-learner architecture inspired by ensemble techniques to access curiosity better. The information gap between the two learners serves as the intrinsic reward for agents, and the state information is consistently consolidated into the dynamic memory. Compared with previous curiosity methods, DyMeCu can better mimic human curiosity by using a dynamic memory that can be dynamically grown based on a bootstrap paradigm with two learners. Large-scale empirical experiments on multiple benchmarks, including DeepMind Control Suite and Atari Suite, demonstrate that DyMeCu outperforms competitive curiosity-based methods with or without extrinsic rewards. In the Atari Suite, DyMeCu achieves a mean human-normalized score of 5.076 on a subset of 26 Atari games, achieving a 77.4% relative improvement over the best other baselines. In the DeepMind Control Suite, DyMeCu presents new state-of-the-art results across 11 tasks of all 12 when compared to curiosity-based methods and other pre-training strategies.

引用

下载

页码：1181 / 1193

页数：13

共 50 条

[11] Dynamic Memory-Based Continual Learning with Generating and Screening
Tao, Siying
Huang, Jinyang
Zhang, Xiang
Sun, Xiao
Gu, Yu
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT III, 2023, 14256 : 365 - 376
[12] Modelling personalised car-following behaviour: a memory-based deep reinforcement learning approach
Liao, Yaping
Yu, Guizhen
Chen, Peng
Zhou, Bin
Li, Han
TRANSPORTMETRICA A-TRANSPORT SCIENCE, 2024, 20 (01) : 36 - 36
[13] Learning, Fast and Slow: A Goal-Directed Memory-Based Approach for Dynamic Environments
Tan, John Chong Min
Motani, Mehul
2023 IEEE INTERNATIONAL CONFERENCE ON DEVELOPMENT AND LEARNING, ICDL, 2023, : 1 - 6
[14] Curiosity-Driven Acquisition of Sensorimotor Concepts Using Memory-Based Active Learning
Roa, Sergio
Kruijff, Geert-Jan M.
Jacobsson, Henrik
2008 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND BIOMIMETICS, VOLS 1-4, 2009, : 665 - 670
[15] ATTENTION-BASED CURIOSITY-DRIVEN EXPLORATION IN DEEP REINFORCEMENT LEARNING
Reizinger, Patrik
Szemenyei, Marton
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3542 - 3546
[16] Automatic HMI Structure Exploration Via Curiosity-Based Reinforcement Learning
Cao, Yushi
Zheng, Yan
Lin, Shang-Wei
Liu, Yang
Teo, Yon Shin
Toh, Yuxuan
Adiga, Vinay Vishnumurthy
2021 36TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING ASE 2021, 2021, : 1151 - 1155
[17] A memory-based reinforcement learning model utilizing macro-actions
Murata, M
Ozawa, S
ADAPTIVE AND NATURAL COMPUTING ALGORITHMS, 2005, : 78 - 81
[18] A Modified Memory-Based Reinforcement Learning Method for Solving POMDP Problems
Zheng, Lei
Cho, Siu-Yeung
NEURAL PROCESSING LETTERS, 2011, 33 (02) : 187 - 200
[19] A Modified Memory-Based Reinforcement Learning Method for Solving POMDP Problems
Lei Zheng
Siu-Yeung Cho
Neural Processing Letters, 2011, 33 : 187 - 200
[20] Random curiosity-driven exploration in deep reinforcement learning
Li, Jing
Shi, Xinxin
Li, Jiehao
Zhang, Xin
Wang, Junzheng
NEUROCOMPUTING, 2020, 418 : 139 - 147

← 1 2 3 4 5 →