Dynamic Memory-Based Curiosity: A Bootstrap Approach for Exploration in Reinforcement Learning

被引:1
|
作者
Gao, Zijian [1 ]
Li, Yiying [2 ]
Xu, Kele [1 ]
Zhai, Yuanzhao [1 ]
Ding, Bo [1 ]
Feng, Dawei [1 ]
Mao, Xinjun [1 ]
Wang, Huaimin [1 ]
机构
[1] Natl Univ Def Technol, Sch Comp, Changsha 410000, Peoples R China
[2] Natl Innovat Inst Def Technol, Artificial Intelligence Res Ctr, Beijing 100073, Peoples R China
关键词
Deep reinforcement learning; curiosity; exploration; intrinsic rewards;
D O I
10.1109/TETCI.2023.3335944
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The sparsity of extrinsic rewards presents a significant challenge for deep reinforcement learning (DRL). As an alternative, researchers have focused on intrinsic rewards to improve exploration efficiency. One of the most representative approaches is utilizing curiosity. However, the challenge of designing effective intrinsic rewards remains, as artificial curiosity differs significantly from human curiosity. In this article, we introduce a novel curiosity approach for DRL, named DyMeCu, which stands for Dynamic Memory-based Curiosity. Inspired by human curiosity and information theory, DyMeCu constructs a dynamic memory using the online learner following the bootstrap paradigm. Additionally, we design a two-learner architecture inspired by ensemble techniques to access curiosity better. The information gap between the two learners serves as the intrinsic reward for agents, and the state information is consistently consolidated into the dynamic memory. Compared with previous curiosity methods, DyMeCu can better mimic human curiosity by using a dynamic memory that can be dynamically grown based on a bootstrap paradigm with two learners. Large-scale empirical experiments on multiple benchmarks, including DeepMind Control Suite and Atari Suite, demonstrate that DyMeCu outperforms competitive curiosity-based methods with or without extrinsic rewards. In the Atari Suite, DyMeCu achieves a mean human-normalized score of 5.076 on a subset of 26 Atari games, achieving a 77.4% relative improvement over the best other baselines. In the DeepMind Control Suite, DyMeCu presents new state-of-the-art results across 11 tasks of all 12 when compared to curiosity-based methods and other pre-training strategies.
引用
收藏
页码:1181 / 1193
页数:13
相关论文
共 50 条
  • [1] Memory-based reinforcement learning algorithm for autonomous exploration in unknown environment
    Dooraki, Amir Ramezani
    Lee, Deok Jin
    [J]. INTERNATIONAL JOURNAL OF ADVANCED ROBOTIC SYSTEMS, 2018, 15 (03):
  • [2] Memory-Based Explainable Reinforcement Learning
    Cruz, Francisco
    Dazeley, Richard
    Vamplew, Peter
    [J]. AI 2019: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, 11919 : 66 - 77
  • [3] Hierarchical memory-based reinforcement learning
    Hernandez-Gardiol, N
    Mahadevan, S
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 13, 2001, 13 : 1047 - 1053
  • [4] Memory-based Deep Reinforcement Learning for POMDPs
    Meng, Lingheng
    Gorbet, Rob
    Kulic, Dana
    [J]. 2021 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2021, : 5619 - 5626
  • [5] A tabular approach memory-based learning
    [J]. Lin, C.-S. (linc@missouri.edu), 1600, Taylor and Francis Inc. (05):
  • [6] Curiosity-driven Exploration in Reinforcement Learning
    Gregor, Michael d
    Spalek, Juraj
    [J]. 2014 ELEKTRO, 2014, : 435 - 440
  • [7] Reinforcement Learning Using a Stochastic Gradient Method with Memory-Based Learning
    Yamada, Takafumi
    Yamaguchi, Satoshi
    [J]. ELECTRICAL ENGINEERING IN JAPAN, 2010, 173 (01) : 32 - 40
  • [8] Study on LSTM and ConvLSTM Memory-Based Deep Reinforcement Learning
    Duarte, Fernando Fradique
    Lau, Nuno
    Pereira, Artur
    Reis, Luis Paulo
    [J]. AGENTS AND ARTIFICIAL INTELLIGENCE, ICAART 2023, 2024, 14546 : 223 - 243
  • [9] Self-Attention-Based Temporary Curiosity in Reinforcement Learning Exploration
    Hu, Hangkai
    Song, Shiji
    Huang, Gao
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2021, 51 (09): : 5773 - 5784
  • [10] Predictive Q-routing: A memory-based reinforcement learning approach to adaptive traffic control
    Choi, SPM
    Yeung, DY
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 8: PROCEEDINGS OF THE 1995 CONFERENCE, 1996, 8 : 945 - 951