Dynamic Memory-Based Curiosity: A Bootstrap Approach for Exploration in Reinforcement Learning

被引:1
|
作者
Gao, Zijian [1 ]
Li, Yiying [2 ]
Xu, Kele [1 ]
Zhai, Yuanzhao [1 ]
Ding, Bo [1 ]
Feng, Dawei [1 ]
Mao, Xinjun [1 ]
Wang, Huaimin [1 ]
机构
[1] Natl Univ Def Technol, Sch Comp, Changsha 410000, Peoples R China
[2] Natl Innovat Inst Def Technol, Artificial Intelligence Res Ctr, Beijing 100073, Peoples R China
关键词
Deep reinforcement learning; curiosity; exploration; intrinsic rewards;
D O I
10.1109/TETCI.2023.3335944
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The sparsity of extrinsic rewards presents a significant challenge for deep reinforcement learning (DRL). As an alternative, researchers have focused on intrinsic rewards to improve exploration efficiency. One of the most representative approaches is utilizing curiosity. However, the challenge of designing effective intrinsic rewards remains, as artificial curiosity differs significantly from human curiosity. In this article, we introduce a novel curiosity approach for DRL, named DyMeCu, which stands for Dynamic Memory-based Curiosity. Inspired by human curiosity and information theory, DyMeCu constructs a dynamic memory using the online learner following the bootstrap paradigm. Additionally, we design a two-learner architecture inspired by ensemble techniques to access curiosity better. The information gap between the two learners serves as the intrinsic reward for agents, and the state information is consistently consolidated into the dynamic memory. Compared with previous curiosity methods, DyMeCu can better mimic human curiosity by using a dynamic memory that can be dynamically grown based on a bootstrap paradigm with two learners. Large-scale empirical experiments on multiple benchmarks, including DeepMind Control Suite and Atari Suite, demonstrate that DyMeCu outperforms competitive curiosity-based methods with or without extrinsic rewards. In the Atari Suite, DyMeCu achieves a mean human-normalized score of 5.076 on a subset of 26 Atari games, achieving a 77.4% relative improvement over the best other baselines. In the DeepMind Control Suite, DyMeCu presents new state-of-the-art results across 11 tasks of all 12 when compared to curiosity-based methods and other pre-training strategies.
引用
收藏
页码:1181 / 1193
页数:13
相关论文
共 50 条
  • [21] A THEORY FOR MEMORY-BASED LEARNING
    LIN, JH
    VITTER, JS
    MACHINE LEARNING, 1994, 17 (2-3) : 143 - 167
  • [22] Memory-Based Reinforcement Learning for Trans-Domain Tiltrotor Robot Control
    Huo, Yujia
    Li, Yiping
    Feng, Xisheng
    2019 THE 10TH ASIA CONFERENCE ON MECHANICAL AND AEROSPACE ENGINEERING (ACMAE 2019), 2020, 1510
  • [23] Memory-based Deep Reinforcement Learning for Humanoid Locomotion under Noisy Scenarios
    Chenatti, Samuel
    Colombini, Esther L.
    2022 LATIN AMERICAN ROBOTICS SYMPOSIUM (LARS), 2022 BRAZILIAN SYMPOSIUM ON ROBOTICS (SBR), AND 2022 WORKSHOP ON ROBOTICS IN EDUCATION (WRE), 2022, : 205 - 210
  • [24] A Memory-based Reinforcement Learning Algorithm for Partially Observable Markovian Decision Processes
    Zheng, Lei
    Cho, Siu-Yeung
    Quek, Chai
    2008 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-8, 2008, : 800 - 805
  • [25] A MEMORY-BASED APPROACH TO NAVIGATION
    CRESPI, B
    FURLANELLO, C
    STRINGA, L
    BIOLOGICAL CYBERNETICS, 1993, 69 (5-6) : 385 - 393
  • [26] A memory-based approach to learning shallow natural language patterns
    Argamon-Engelson, S
    Dagan, I
    Krymolowski, Y
    JOURNAL OF EXPERIMENTAL & THEORETICAL ARTIFICIAL INTELLIGENCE, 1999, 11 (03) : 369 - 390
  • [27] Penetration Strategy for High-Speed Unmanned Aerial Vehicles: A Memory-Based Deep Reinforcement Learning Approach
    Zhang, Xiaojie
    Guo, Hang
    Yan, Tian
    Wang, Xiaoming
    Sun, Wendi
    Fu, Wenxing
    Yan, Jie
    DRONES, 2024, 8 (07)
  • [28] Efficient exploration in reinforcement learning based on utile suffix memory
    Pchelkin, A
    INFORMATICA, 2003, 14 (02) : 237 - 250
  • [29] Curiosity-based Topological Reinforcement Learning
    Hafez, Muhammad Burhan
    Kiong, Loo Chu
    2014 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), 2014, : 1979 - 1984
  • [30] Fast and slow curiosity for high-level exploration in reinforcement learning
    Bougie, Nicolas
    Ichise, Ryutaro
    APPLIED INTELLIGENCE, 2021, 51 (02) : 1086 - 1107