Offline Meta-Reinforcement Learning with Online Self-Supervision

被引:0
|
作者
Pong, Vitchyr H. [1 ]
Nair, Ashvin [1 ]
Smith, Laura [1 ]
Huang, Catherine [1 ]
Levine, Sergey [1 ]
机构
[1] Univ Calif Berkeley, Berkeley, CA 94720 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Meta-reinforcement learning (RL) methods can meta-train policies that adapt to new tasks with orders of magnitude less data than standard RL, but meta-training itself is costly and time-consuming. If we can meta-train on offline data, then we can reuse the same static dataset, labeled once with rewards for different tasks, to meta-train policies that adapt to a variety of new tasks at meta-test time. Although this capability would make meta-RL a practical tool for real-world use, offline meta-RL presents additional challenges beyond online meta-RL or standard offline RL settings. Meta-RL learns an exploration strategy that collects data for adapting, and also meta-trains a policy that quickly adapts to data from a new task. Since this policy was meta-trained on a fixed, offline dataset, it might behave unpredictably when adapting to data collected by the learned exploration strategy, which differs systematically from the offline data and thus induces distributional shift. We propose a hybrid offline meta-RL algorithm, which uses offline data with rewards to meta-train an adaptive policy, and then collects additional unsupervised online data, without any reward labels to bridge this distribution shift. By not requiring reward labels for online collection, this data can be much cheaper to collect. We compare our method to prior work on offline meta-RL on simulated robot locomotion and manipulation tasks and find that using additional unsupervised online data collection leads to a dramatic improvement in the adaptive capabilities of the meta-trained policies, matching the performance of fully online metaRL on a range of challenging domains that require generalization to new tasks.
引用
下载
收藏
页数:19
相关论文
共 50 条
  • [31] Prefrontal cortex as a meta-reinforcement learning system
    Wang, Jane X.
    Kurth-Nelson, Zeb
    Kumaran, Dharshan
    Tirumala, Dhruva
    Soyer, Hubert
    Leibo, Joel Z.
    Hassabis, Demis
    Botvinick, Matthew
    NATURE NEUROSCIENCE, 2018, 21 (06) : 860 - +
  • [32] THE FEASIBILITY OF SELF-SUPERVISION
    Hudelson, Earl
    JOURNAL OF EDUCATIONAL RESEARCH, 1952, 45 (05): : 335 - 347
  • [33] Improving Model-Based Reinforcement Learning with Internal State Representations through Self-Supervision
    Scholz, Julien
    Weber, Cornelius
    Hafez, Muhammad Burhan
    Wermter, Stefan
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [34] Some Considerations on Learning to Explore via Meta-Reinforcement Learning
    Stadie, Bradly C.
    Yang, Ge
    Houthooft, Rein
    Chen, Xi
    Duan, Yan
    Wu, Yuhuai
    Abbeel, Pieter
    Sutskever, Ilya
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [35] MACARONS: Mapping And Coverage Anticipation with RGB Online Self-Supervision
    Guedon, Antoine
    Monnier, Tom
    Monasse, Pascal
    Lepetit, Vincent
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 940 - 951
  • [36] Self-distillation and self-supervision for partial label learning
    Yu, Xiaotong
    Sun, Shiding
    Tian, Yingjie
    PATTERN RECOGNITION, 2024, 146
  • [37] Semantic alignment with self-supervision for class incremental learning
    Fu, Zhiling
    Wang, Zhe
    Xu, Xinlei
    Yang, Mengping
    Chi, Ziqiu
    Ding, Weichao
    KNOWLEDGE-BASED SYSTEMS, 2023, 282
  • [38] Model-based Adversarial Meta-Reinforcement Learning
    Lin, Zichuan
    Thomas, Garrett
    Yang, Guangwen
    Ma, Tengyu
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [39] Meta-Reinforcement Learning via Exploratory Task Clustering
    Chu, Zhendong
    Cai, Renqin
    Wang, Hongning
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 10, 2024, : 11633 - 11641
  • [40] Meta-reinforcement learning for edge caching in vehicular networks
    Sakr H.
    Elsabrouty M.
    Journal of Ambient Intelligence and Humanized Computing, 2023, 14 (04) : 4607 - 4619