Offline Meta-Reinforcement Learning with Online Self-Supervision

被引:0
|
作者
Pong, Vitchyr H. [1 ]
Nair, Ashvin [1 ]
Smith, Laura [1 ]
Huang, Catherine [1 ]
Levine, Sergey [1 ]
机构
[1] Univ Calif Berkeley, Berkeley, CA 94720 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Meta-reinforcement learning (RL) methods can meta-train policies that adapt to new tasks with orders of magnitude less data than standard RL, but meta-training itself is costly and time-consuming. If we can meta-train on offline data, then we can reuse the same static dataset, labeled once with rewards for different tasks, to meta-train policies that adapt to a variety of new tasks at meta-test time. Although this capability would make meta-RL a practical tool for real-world use, offline meta-RL presents additional challenges beyond online meta-RL or standard offline RL settings. Meta-RL learns an exploration strategy that collects data for adapting, and also meta-trains a policy that quickly adapts to data from a new task. Since this policy was meta-trained on a fixed, offline dataset, it might behave unpredictably when adapting to data collected by the learned exploration strategy, which differs systematically from the offline data and thus induces distributional shift. We propose a hybrid offline meta-RL algorithm, which uses offline data with rewards to meta-train an adaptive policy, and then collects additional unsupervised online data, without any reward labels to bridge this distribution shift. By not requiring reward labels for online collection, this data can be much cheaper to collect. We compare our method to prior work on offline meta-RL on simulated robot locomotion and manipulation tasks and find that using additional unsupervised online data collection leads to a dramatic improvement in the adaptive capabilities of the meta-trained policies, matching the performance of fully online metaRL on a range of challenging domains that require generalization to new tasks.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] State Augmentation via Self-Supervision in Offline Multiagent Reinforcement Learning
    Wang, Siying
    Li, Xiaodie
    Qu, Hong
    Chen, Wenyu
    [J]. IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2024, 16 (03) : 1051 - 1062
  • [2] Offline Meta-Reinforcement Learning for Industrial Insertion
    Zhao, Tony Z.
    Luo, Jianlan
    Sushkov, Oleg
    Pevceviciute, Rugile
    Heess, Nicolas
    Scholz, Jon
    Schaal, Stefan
    Levine, Sergey
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2022, 2022, : 6386 - 6393
  • [3] Offline Meta-Reinforcement Learning with Advantage Weighting
    Mitchell, Eric
    Rafailov, Rafael
    Peng, Xue Bin
    Levine, Sergey
    Finn, Chelsea
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [4] PAC-Bayesian offline Meta-reinforcement learning
    Zheng Sun
    Chenheng Jing
    Shangqi Guo
    Lingling An
    [J]. Applied Intelligence, 2023, 53 : 27128 - 27147
  • [5] PAC-Bayesian offline Meta-reinforcement learning
    Sun, Zheng
    Jing, Chenheng
    Guo, Shangqi
    An, Lingling
    [J]. APPLIED INTELLIGENCE, 2023, 53 (22) : 27128 - 27147
  • [6] Context Shift Reduction for Offline Meta-Reinforcement Learning
    Gao, Yunkai
    Zhang, Rui
    Guo, Jiaming
    Wu, Fan
    Yi, Qi
    Peng, Shaohui
    Lan, Siming
    Chen, Ruizhi
    Du, Zidong
    Hu, Xing
    Guo, Qi
    Li, Ling
    Chen, Yunji
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [7] Meta-reinforcement learning for the tuning of PI controllers: An offline approach
    McClement, Daniel G.
    Lawrence, Nathan P.
    Backstroem, Johan U.
    Loewen, Philip D.
    Forbes, Michael G.
    Gopaluni, R. Bhushan
    [J]. JOURNAL OF PROCESS CONTROL, 2022, 118 : 139 - 152
  • [8] Improving Spatiotemporal Self-supervision by Deep Reinforcement Learning
    Buechler, Uta
    Brattoli, Biagio
    Ommer, Bjoern
    [J]. COMPUTER VISION - ECCV 2018, PT 15, 2018, 11219 : 797 - 814
  • [9] Generalizable Task Representation Learning for Offline Meta-Reinforcement Learning with Data Limitations
    Zhou, Renzhe
    Gao, Chen-Xiao
    Zhang, Zongzhang
    Yu, Yang
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 15, 2024, : 17132 - 17140
  • [10] Robust Task Representations for Offline Meta-Reinforcement Learning via Contrastive Learning
    Yuan, Haoqi
    Lu, Zongqing
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,