Offline Meta-Reinforcement Learning with Online Self-Supervision

被引:0
|
作者
Pong, Vitchyr H. [1 ]
Nair, Ashvin [1 ]
Smith, Laura [1 ]
Huang, Catherine [1 ]
Levine, Sergey [1 ]
机构
[1] Univ Calif Berkeley, Berkeley, CA 94720 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Meta-reinforcement learning (RL) methods can meta-train policies that adapt to new tasks with orders of magnitude less data than standard RL, but meta-training itself is costly and time-consuming. If we can meta-train on offline data, then we can reuse the same static dataset, labeled once with rewards for different tasks, to meta-train policies that adapt to a variety of new tasks at meta-test time. Although this capability would make meta-RL a practical tool for real-world use, offline meta-RL presents additional challenges beyond online meta-RL or standard offline RL settings. Meta-RL learns an exploration strategy that collects data for adapting, and also meta-trains a policy that quickly adapts to data from a new task. Since this policy was meta-trained on a fixed, offline dataset, it might behave unpredictably when adapting to data collected by the learned exploration strategy, which differs systematically from the offline data and thus induces distributional shift. We propose a hybrid offline meta-RL algorithm, which uses offline data with rewards to meta-train an adaptive policy, and then collects additional unsupervised online data, without any reward labels to bridge this distribution shift. By not requiring reward labels for online collection, this data can be much cheaper to collect. We compare our method to prior work on offline meta-RL on simulated robot locomotion and manipulation tasks and find that using additional unsupervised online data collection leads to a dramatic improvement in the adaptive capabilities of the meta-trained policies, matching the performance of fully online metaRL on a range of challenging domains that require generalization to new tasks.
引用
下载
收藏
页数:19
相关论文
共 50 条
  • [21] Meta-Reinforcement Learning of Structured Exploration Strategies
    Gupta, Abhishek
    Mendonca, Russell
    Liu, YuXuan
    Abbeel, Pieter
    Levine, Sergey
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [22] A Meta-Reinforcement Learning Approach to Process Control
    McClement, Daniel G.
    Lawrence, Nathan P.
    Loewen, Philip D.
    Forbes, Michael G.
    Backstrom, Johan U.
    Gopaluni, R. Bhushan
    IFAC PAPERSONLINE, 2021, 54 (03): : 685 - 692
  • [23] Meta-Reinforcement Learning With Dynamic Adaptiveness Distillation
    Hu, Hangkai
    Huang, Gao
    Li, Xiang
    Song, Shiji
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (03) : 1454 - 1464
  • [24] A Meta-Reinforcement Learning Algorithm for Causal Discovery
    Sauter, Andreas
    Acar, Erman
    Francois-Lavet, Vincent
    CONFERENCE ON CAUSAL LEARNING AND REASONING, VOL 213, 2023, 213 : 602 - 619
  • [25] Context meta-reinforcement learning via neuromodulation
    Ben-Iwhiwhu, Eseoghene
    Dick, Jeffery
    Ketz, Nicholas A.
    Pilly, Praveen K.
    Soltoggio, Andrea
    NEURAL NETWORKS, 2022, 152 : 70 - 79
  • [26] Formalising Performance Guarantees in Meta-Reinforcement Learning
    Mahony, Amanda
    FORMAL METHODS AND SOFTWARE ENGINEERING, ICFEM 2018, 2018, 11232 : 469 - 472
  • [27] Meta-reinforcement learning via orbitofrontal cortex
    Hattori, Ryoma
    Hedrick, Nathan G.
    Jain, Anant
    Chen, Shuqi
    You, Hanjia
    Hattori, Mariko
    Choi, Jun-Hyeok
    Lim, Byung Kook
    Yasuda, Ryohei
    Komiyama, Takaki
    NATURE NEUROSCIENCE, 2023, 26 (12) : 2182 - 2191
  • [28] OCTET: Online Catalog Taxonomy Enrichment with Self-Supervision
    Mao, Yuning
    Zhao, Tong
    Kan, Andrey
    Zhang, Chenwei
    Dong, Xin Luna
    Faloutsos, Christos
    Han, Jiawei
    KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 2247 - 2257
  • [29] Learning to Remove Rain in Video With Self-Supervision
    Yang, Wenhan
    Tan, Robby T.
    Wang, Shiqi
    Kot, Alex C.
    Liu, Jiaying
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (03) : 1378 - 1396
  • [30] Prototype Augmentation and Self-Supervision for Incremental Learning
    Zhu, Fei
    Zhang, Xu-Yao
    Wang, Chuang
    Yin, Fei
    Liu, Cheng-Lin
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 5867 - 5876