Sample-efficient Reinforcement Learning Representation Learning with Curiosity Contrastive Forward Dynamics Model

被引:9
|
作者
Nguyen, Thanh [1 ]
Luu, Tung M. [1 ]
Vu, Thang [1 ]
Yoo, Chang D. [1 ]
机构
[1] Korea Adv Inst Sci & Technol, Fac Elect Engn, Daejeon 34141, South Korea
关键词
LEVEL; GO;
D O I
10.1109/IROS51168.2021.9636536
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Developing an agent in reinforcement learning (RL) that is capable of performing complex control tasks directly from high-dimensional observation such as raw pixels is a challenge as efforts still need to be made towards improving sample efficiency and generalization of RL algorithm. This paper considers a learning framework for a Curiosity Contrastive Forward Dynamics Model (CCFDM) to achieve a more sample-efficient RL based directly on raw pixels. CCFDM incorporates a forward dynamics model (FDM) and performs contrastive learning to train its deep convolutional neural network-based image encoder (IE) to extract conducive spatial and temporal information to achieve a more sample efficiency for RL. In addition, during training, CCFDM provides intrinsic rewards, produced based on FDM prediction error, and encourages the curiosity of the RL agent to improve exploration. The diverge and less-repetitive observations provided by both our exploration strategy and data augmentation available in contrastive learning improve not only the sample efficiency but also the generalization. Performance of existing model-free RL methods such as Soft Actor-Critic built on top of CCFDM outperforms prior state-of-the-art pixel-based RL methods on the DeepMind Control Suite benchmark.
引用
收藏
页码:3471 / 3477
页数:7
相关论文
共 50 条
  • [41] Sample-Efficient Multi-Agent Reinforcement Learning with Demonstrations for Flocking Control
    Qiu, Yunbo
    Zhan, Yuzhu
    Jin, Yue
    Wang, Jian
    Zhang, Xudong
    2022 IEEE 96TH VEHICULAR TECHNOLOGY CONFERENCE (VTC2022-FALL), 2022,
  • [42] M2CURL: Sample-Efficient Multimodal Reinforcement Learning via Self-Supervised Representation Learning for Robotic Manipulation
    Lygerakis, Folios
    Dave, Vedant
    Rueckert, Flitiar
    2024 21ST INTERNATIONAL CONFERENCE ON UBIQUITOUS ROBOTS, UR 2024, 2024, : 490 - 497
  • [43] Sample-Efficient Reinforcement Learning Is Feasible for Linearly Realizable MDPs with Limited Revisiting
    Li, Gen
    Chen, Yuxin
    Chi, Yuejie
    Gu, Yuantao
    Wei, Yuting
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [44] Model-aided Deep Reinforcement Learning for Sample-efficient UAV Trajectory Design in IoT Networks
    Esrafilian, Omid
    Bayerlein, Harald
    Gesbert, David
    2021 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2021,
  • [45] Is Plug-in Solver Sample-Efficient for Feature-based Reinforcement Learning?
    Cui, Qiwen
    Yang, Lin F.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [46] Sample-efficient reinforcement learning with knowledge-embedded hybrid model for optimal control of mining industry
    Zheng, Jun
    Jia, Runda
    Liu, Shaoning
    He, Dakuo
    Li, Kang
    Wang, Fuli
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 254
  • [47] Affordance Learning from Play for Sample-Efficient Policy Learning
    Borja-Diaz, Jessica
    Mees, Oier
    Kalweit, Gabriel
    Hermann, Lukas
    Boedecker, Joschka
    Burgard, Wolfram
    2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2022, 2022, : 6372 - 6378
  • [48] Sample-efficient strategies for learning in the presence of noise
    Cesa-Bianchi, N
    Dichterman, E
    Fischer, P
    Shamir, E
    Simon, HU
    JOURNAL OF THE ACM, 1999, 46 (05) : 684 - 719
  • [49] Sample-efficient learning of interacting quantum systems
    Anurag Anshu
    Srinivasan Arunachalam
    Tomotaka Kuwahara
    Mehdi Soleimanifar
    Nature Physics, 2021, 17 : 931 - 935
  • [50] Sample-efficient learning of interacting quantum systems
    Anshu, Anurag
    Arunachalam, Srinivasan
    Kuwahara, Tomotaka
    Soleimanifar, Mehdi
    NATURE PHYSICS, 2021, 17 (08) : 931 - +