Offline Meta-Reinforcement Learning with Online Self-Supervision

被引:0
|
作者
Pong, Vitchyr H. [1 ]
Nair, Ashvin [1 ]
Smith, Laura [1 ]
Huang, Catherine [1 ]
Levine, Sergey [1 ]
机构
[1] Univ Calif Berkeley, Berkeley, CA 94720 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Meta-reinforcement learning (RL) methods can meta-train policies that adapt to new tasks with orders of magnitude less data than standard RL, but meta-training itself is costly and time-consuming. If we can meta-train on offline data, then we can reuse the same static dataset, labeled once with rewards for different tasks, to meta-train policies that adapt to a variety of new tasks at meta-test time. Although this capability would make meta-RL a practical tool for real-world use, offline meta-RL presents additional challenges beyond online meta-RL or standard offline RL settings. Meta-RL learns an exploration strategy that collects data for adapting, and also meta-trains a policy that quickly adapts to data from a new task. Since this policy was meta-trained on a fixed, offline dataset, it might behave unpredictably when adapting to data collected by the learned exploration strategy, which differs systematically from the offline data and thus induces distributional shift. We propose a hybrid offline meta-RL algorithm, which uses offline data with rewards to meta-train an adaptive policy, and then collects additional unsupervised online data, without any reward labels to bridge this distribution shift. By not requiring reward labels for online collection, this data can be much cheaper to collect. We compare our method to prior work on offline meta-RL on simulated robot locomotion and manipulation tasks and find that using additional unsupervised online data collection leads to a dramatic improvement in the adaptive capabilities of the meta-trained policies, matching the performance of fully online metaRL on a range of challenging domains that require generalization to new tasks.
引用
下载
收藏
页数:19
相关论文
共 50 条
  • [41] Meta-Reinforcement Learning for Robotic Industrial Insertion Tasks
    Schoettler, Gerrit
    Nair, Ashvin
    Ojea, Juan Aparicio
    Levine, Sergey
    Solowjow, Eugen
    2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2020, : 9728 - 9735
  • [42] Meta-Reinforcement Learning for Multiple Traffic Signals Control
    Lou, Yican
    Wu, Jia
    Ran, Yunchuan
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 4264 - 4268
  • [43] Dynamic Channel Access via Meta-Reinforcement Learning
    Lu, Ziyang
    Gursoy, M. Cenk
    2021 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2021,
  • [44] Self-Adaptive Server Anomaly Detection Using Ensemble Meta-Reinforcement Learning
    Chang, Bao Rong
    Tsai, Hsiu-Fen
    Chen, Guan-Ru
    ELECTRONICS, 2024, 13 (12)
  • [45] Doubly Robust Augmented Transfer for Meta-Reinforcement Learning
    Jiang, Yuankun
    Kan, Nuowen
    Li, Chenglin
    Dai, Wenrui
    Zou, Junni
    Xiong, Hongkai
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [46] Prioritized Hindsight with Dual Buffer for Meta-Reinforcement Learning
    Beyene, Sofanit Wubeshet
    Han, Ji-Hyeong
    ELECTRONICS, 2022, 11 (24)
  • [47] Wireless Power Control via Meta-Reinforcement Learning
    Lu, Ziyang
    Gursoy, M. Cenk
    IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC 2022), 2022, : 1562 - 1567
  • [48] CoLES: Contrastive Learning for Event Sequences with Self-Supervision
    Babaev, Dmitrii
    Ovsov, Nikita
    Kireev, Ivan
    Ivanova, Maria
    Gusev, Gleb
    Nazarov, Ivan
    Tuzhilin, Alexander
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22), 2022, : 1190 - 1199
  • [49] Taming MAML: Efficient Unbiased Meta-Reinforcement Learning
    Liu, Hao
    Socher, Richard
    Xiong, Caiming
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [50] A Theoretical Understanding of Gradient Bias in Meta-Reinforcement Learning
    Liu, Bo
    Feng, Xidong
    Ren, Jie
    Mai, Luo
    Zhu, Rui
    Zhang, Haifeng
    Wang, Jun
    Yang, Yaodong
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,