SCORE: Simple Contrastive Representation and Reset-Ensemble for offline meta-reinforcement learning

被引:0
|
作者
Yang, Hanjie [1 ]
Lin, Kai [1 ]
Yang, Tao [1 ]
Sun, Guohan [1 ]
机构
[1] Dalian Univ Technol, Dept Comp Sci & Technol, Dalian, Peoples R China
关键词
Offline meta reinforcement learning; Contrastive learning; Reset-Ensemble;
D O I
10.1016/j.knosys.2024.112767
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Offline meta-reinforcement learning (OMRL) aims to train agents to quickly adapt to new tasks using only pre-collected data. However, existing OMRL methods often involve numerous ineffective training iterations and may experience performance collapse in the later stages of training. We identify the root cause-shallow memorization problem, where agents overspecialize in specific solutions for encountered states, hindering their generalization performance. This issue arises due to the loss of plasticity and the premature fitting of neural networks, which restricts the exploration of the agents. To address this challenge, we propose S imple CO ntrastive Representation and R eset-Ensemble for OMRL (SCORE), a novel context-based OMRL approach. SCORE introduces an end-to-end contrastive learning framework without negative samples to pre-train a context encoder, enabling more robust task representations. Subsequently, the context encoder is fine-tuned during meta-training. Furthermore, SCORE employs a Reset-Ensemble mechanism that periodically resets and ensembles partial networks to maintain the agents' continual learning ability and enhance their perception of characteristics across diverse tasks. Extensive experiments demonstrate that our SCORE method effectively avoids premature fitting and exhibits excellent generalization performance.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Meta-Reinforcement Learning with Self-Modifying Networks
    Chalvidal, Mathieu
    Serre, Thomas
    VanRullen, Rufin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [32] Model-based Adversarial Meta-Reinforcement Learning
    Lin, Zichuan
    Thomas, Garrett
    Yang, Guangwen
    Ma, Tengyu
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [33] Meta-Reinforcement Learning for Robotic Industrial Insertion Tasks
    Schoettler, Gerrit
    Nair, Ashvin
    Ojea, Juan Aparicio
    Levine, Sergey
    Solowjow, Eugen
    2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2020, : 9728 - 9735
  • [34] A Review of Offline Reinforcement Learning Based on Representation Learning
    Wang X.-S.
    Wang R.-R.
    Cheng Y.-H.
    Zidonghua Xuebao/Acta Automatica Sinica, 2024, 50 (06): : 1104 - 1128
  • [35] Meta-reinforcement learning for edge caching in vehicular networks
    Sakr H.
    Elsabrouty M.
    Journal of Ambient Intelligence and Humanized Computing, 2023, 14 (04) : 4607 - 4619
  • [36] Doubly Robust Augmented Transfer for Meta-Reinforcement Learning
    Jiang, Yuankun
    Kan, Nuowen
    Li, Chenglin
    Dai, Wenrui
    Zou, Junni
    Xiong, Hongkai
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [37] Wireless Power Control via Meta-Reinforcement Learning
    Lu, Ziyang
    Gursoy, M. Cenk
    IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC 2022), 2022, : 1562 - 1567
  • [38] Prioritized Hindsight with Dual Buffer for Meta-Reinforcement Learning
    Beyene, Sofanit Wubeshet
    Han, Ji-Hyeong
    ELECTRONICS, 2022, 11 (24)
  • [39] Meta-Reinforcement Learning for Multiple Traffic Signals Control
    Lou, Yican
    Wu, Jia
    Ran, Yunchuan
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 4264 - 4268
  • [40] Dynamic Channel Access via Meta-Reinforcement Learning
    Lu, Ziyang
    Gursoy, M. Cenk
    2021 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2021,