Dynamic Memory-Based Curiosity: A Bootstrap Approach for Exploration in Reinforcement Learning

被引:1
|
作者
Gao, Zijian [1 ]
Li, Yiying [2 ]
Xu, Kele [1 ]
Zhai, Yuanzhao [1 ]
Ding, Bo [1 ]
Feng, Dawei [1 ]
Mao, Xinjun [1 ]
Wang, Huaimin [1 ]
机构
[1] Natl Univ Def Technol, Sch Comp, Changsha 410000, Peoples R China
[2] Natl Innovat Inst Def Technol, Artificial Intelligence Res Ctr, Beijing 100073, Peoples R China
关键词
Deep reinforcement learning; curiosity; exploration; intrinsic rewards;
D O I
10.1109/TETCI.2023.3335944
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The sparsity of extrinsic rewards presents a significant challenge for deep reinforcement learning (DRL). As an alternative, researchers have focused on intrinsic rewards to improve exploration efficiency. One of the most representative approaches is utilizing curiosity. However, the challenge of designing effective intrinsic rewards remains, as artificial curiosity differs significantly from human curiosity. In this article, we introduce a novel curiosity approach for DRL, named DyMeCu, which stands for Dynamic Memory-based Curiosity. Inspired by human curiosity and information theory, DyMeCu constructs a dynamic memory using the online learner following the bootstrap paradigm. Additionally, we design a two-learner architecture inspired by ensemble techniques to access curiosity better. The information gap between the two learners serves as the intrinsic reward for agents, and the state information is consistently consolidated into the dynamic memory. Compared with previous curiosity methods, DyMeCu can better mimic human curiosity by using a dynamic memory that can be dynamically grown based on a bootstrap paradigm with two learners. Large-scale empirical experiments on multiple benchmarks, including DeepMind Control Suite and Atari Suite, demonstrate that DyMeCu outperforms competitive curiosity-based methods with or without extrinsic rewards. In the Atari Suite, DyMeCu achieves a mean human-normalized score of 5.076 on a subset of 26 Atari games, achieving a 77.4% relative improvement over the best other baselines. In the DeepMind Control Suite, DyMeCu presents new state-of-the-art results across 11 tasks of all 12 when compared to curiosity-based methods and other pre-training strategies.
引用
下载
收藏
页码:1181 / 1193
页数:13
相关论文
共 50 条
  • [11] Dynamic Memory-Based Continual Learning with Generating and Screening
    Tao, Siying
    Huang, Jinyang
    Zhang, Xiang
    Sun, Xiao
    Gu, Yu
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT III, 2023, 14256 : 365 - 376
  • [12] Modelling personalised car-following behaviour: a memory-based deep reinforcement learning approach
    Liao, Yaping
    Yu, Guizhen
    Chen, Peng
    Zhou, Bin
    Li, Han
    TRANSPORTMETRICA A-TRANSPORT SCIENCE, 2024, 20 (01) : 36 - 36
  • [13] Learning, Fast and Slow: A Goal-Directed Memory-Based Approach for Dynamic Environments
    Tan, John Chong Min
    Motani, Mehul
    2023 IEEE INTERNATIONAL CONFERENCE ON DEVELOPMENT AND LEARNING, ICDL, 2023, : 1 - 6
  • [14] Curiosity-Driven Acquisition of Sensorimotor Concepts Using Memory-Based Active Learning
    Roa, Sergio
    Kruijff, Geert-Jan M.
    Jacobsson, Henrik
    2008 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND BIOMIMETICS, VOLS 1-4, 2009, : 665 - 670
  • [15] ATTENTION-BASED CURIOSITY-DRIVEN EXPLORATION IN DEEP REINFORCEMENT LEARNING
    Reizinger, Patrik
    Szemenyei, Marton
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3542 - 3546
  • [16] Automatic HMI Structure Exploration Via Curiosity-Based Reinforcement Learning
    Cao, Yushi
    Zheng, Yan
    Lin, Shang-Wei
    Liu, Yang
    Teo, Yon Shin
    Toh, Yuxuan
    Adiga, Vinay Vishnumurthy
    2021 36TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING ASE 2021, 2021, : 1151 - 1155
  • [17] A memory-based reinforcement learning model utilizing macro-actions
    Murata, M
    Ozawa, S
    ADAPTIVE AND NATURAL COMPUTING ALGORITHMS, 2005, : 78 - 81
  • [18] A Modified Memory-Based Reinforcement Learning Method for Solving POMDP Problems
    Zheng, Lei
    Cho, Siu-Yeung
    NEURAL PROCESSING LETTERS, 2011, 33 (02) : 187 - 200
  • [19] A Modified Memory-Based Reinforcement Learning Method for Solving POMDP Problems
    Lei Zheng
    Siu-Yeung Cho
    Neural Processing Letters, 2011, 33 : 187 - 200
  • [20] Random curiosity-driven exploration in deep reinforcement learning
    Li, Jing
    Shi, Xinxin
    Li, Jiehao
    Zhang, Xin
    Wang, Junzheng
    NEUROCOMPUTING, 2020, 418 : 139 - 147