Generalizable Task Representation Learning for Offline Meta-Reinforcement Learning with Data Limitations

被引:0
|
作者
Zhou, Renzhe
Gao, Chen-Xiao
Zhang, Zongzhang [1 ]
Yu, Yang
机构
[1] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing, Peoples R China
基金
国家重点研发计划; 美国国家科学基金会;
关键词
CONTEXT;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Generalization and sample efficiency have been longstanding issues concerning reinforcement learning, and thus the field of Offline Meta-Reinforcement Learning (OMRL) has gained increasing attention due to its potential of solving a wide range of problems with static and limited offline data. Existing OMRL methods often assume sufficient training tasks and data coverage to apply contrastive learning to extract task representations. However, such assumptions are not applicable in several real-world applications and thus undermine the generalization ability of the representations. In this paper, we consider OMRL with two types of data limitations: limited training tasks and limited behavior diversity and propose a novel algorithm called GENTLE for learning generalizable task representations in the face of data limitations. GENTLE employs Task Auto-Encoder (TAE), which is an encoder-decoder architecture to extract the characteristics of the tasks. Unlike existing methods, TAE is optimized solely by reconstruction of the state transition and reward, which captures the generative structure of the task models and produces generalizable representations when training tasks are limited. To alleviate the effect of limited behavior diversity, we consistently construct pseudo-transitions to align the data distribution used to train TAE with the data distribution encountered during testing. Empirically, GENTLE significantly outperforms existing OMRL methods on both indistribution tasks and out-of-distribution tasks across both the given-context protocol and the one-shot protocol.
引用
收藏
页码:17132 / 17140
页数:9
相关论文
共 50 条
  • [1] Robust Task Representations for Offline Meta-Reinforcement Learning via Contrastive Learning
    Yuan, Haoqi
    Lu, Zongqing
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [2] Offline Meta-Reinforcement Learning for Industrial Insertion
    Zhao, Tony Z.
    Luo, Jianlan
    Sushkov, Oleg
    Pevceviciute, Rugile
    Heess, Nicolas
    Scholz, Jon
    Schaal, Stefan
    Levine, Sergey
    2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2022, 2022, : 6386 - 6393
  • [3] Meta-Reinforcement Learning Based on Self-Supervised Task Representation Learning
    Wang, Mingyang
    Bing, Zhenshan
    Yao, Xiangtong
    Wang, Shuai
    Kai, Huang
    Su, Hang
    Yang, Chenguang
    Knoll, Alois
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 8, 2023, : 10157 - 10165
  • [4] Offline Meta-Reinforcement Learning with Advantage Weighting
    Mitchell, Eric
    Rafailov, Rafael
    Peng, Xue Bin
    Levine, Sergey
    Finn, Chelsea
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [5] PAC-Bayesian offline Meta-reinforcement learning
    Sun, Zheng
    Jing, Chenheng
    Guo, Shangqi
    An, Lingling
    APPLIED INTELLIGENCE, 2023, 53 (22) : 27128 - 27147
  • [6] PAC-Bayesian offline Meta-reinforcement learning
    Zheng Sun
    Chenheng Jing
    Shangqi Guo
    Lingling An
    Applied Intelligence, 2023, 53 : 27128 - 27147
  • [7] Context Shift Reduction for Offline Meta-Reinforcement Learning
    Gao, Yunkai
    Zhang, Rui
    Guo, Jiaming
    Wu, Fan
    Yi, Qi
    Peng, Shaohui
    Lan, Siming
    Chen, Ruizhi
    Du, Zidong
    Hu, Xing
    Guo, Qi
    Li, Ling
    Chen, Yunji
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [8] SCORE: Simple Contrastive Representation and Reset-Ensemble for offline meta-reinforcement learning
    Yang, Hanjie
    Lin, Kai
    Yang, Tao
    Sun, Guohan
    KNOWLEDGE-BASED SYSTEMS, 2025, 309
  • [9] Meta-reinforcement learning for the tuning of PI controllers: An offline approach
    McClement, Daniel G.
    Lawrence, Nathan P.
    Backstroem, Johan U.
    Loewen, Philip D.
    Forbes, Michael G.
    Gopaluni, R. Bhushan
    JOURNAL OF PROCESS CONTROL, 2022, 118 : 139 - 152
  • [10] Offline Meta-Reinforcement Learning with Online Self-Supervision
    Pong, Vitchyr H.
    Nair, Ashvin
    Smith, Laura
    Huang, Catherine
    Levine, Sergey
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,