Self-Supervised Reinforcement Learning that Transfers using Random Features

被引:0
|
作者
Chen, Boyuan [1 ]
Zhu, Chuning [2 ]
Agrawal, Pulkit [1 ]
Zhang, Kaiqing [3 ]
Gupta, Abhishek [2 ]
机构
[1] MIT, Boston, MA 02139 USA
[2] Univ Washington, Seattle, WA 98105 USA
[3] Univ Maryland, College Pk, MD 20742 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Model-free reinforcement learning algorithms have exhibited great potential in solving single-task sequential decision-making problems with high-dimensional observations and long horizons, but are known to be hard to generalize across tasks. Model-based RL, on the other hand, learns task-agnostic models of the world that naturally enables transfer across different reward functions, but struggles to scale to complex environments due to the compounding error. To get the best of both worlds, we propose a self-supervised reinforcement learning method that enables the transfer of behaviors across tasks with different rewards, while circumventing the challenges of model-based RL. In particular, we show self-supervised pretraining of model-free reinforcement learning with a number of random features as rewards allows implicit modeling of long-horizon environment dynamics. Then, planning techniques like model-predictive control using these implicit models enable fast adaptation to problems with new reward functions. Our method is self-supervised in that it can be trained on offline datasets without reward labels, but can then be quickly deployed on new tasks. We validate that our proposed method enables transfer across tasks on a variety of manipulation and locomotion domains in simulation, opening the door to generalist decision-making agents.
引用
收藏
页数:26
相关论文
共 50 条
  • [1] Self-Supervised Discovering of Interpretable Features for Reinforcement Learning
    Shi, Wenjie
    Huang, Gao
    Song, Shiji
    Wang, Zhuoyuan
    Lin, Tingyu
    Wu, Cheng
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (05) : 2712 - 2724
  • [2] Self-Supervised Regrasping using Spatio-Temporal Tactile Features and Reinforcement Learning
    Chebotar, Yevgen
    Hausman, Karol
    Su, Zhe
    Sukhatme, Gaurav S.
    Schaal, Stefan
    [J]. 2016 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2016), 2016, : 1960 - 1966
  • [3] Intrinsically Motivated Self-supervised Learning in Reinforcement Learning
    Zhao, Yue
    Du, Chenzhuang
    Zhao, Hang
    Li, Tiejun
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2022), 2022, : 3605 - 3615
  • [4] Self-Supervised Reinforcement Learning for Recommender Systems
    Xin, Xin
    Karatzoglou, Alexandros
    Arapakis, Ioannis
    Jose, Joemon M.
    [J]. PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 931 - 940
  • [5] Reinforcement Learning with Attention that Works: A Self-Supervised Approach
    Manchin, Anthony
    Abbasnejad, Ehsan
    van den Hengel, Anton
    [J]. NEURAL INFORMATION PROCESSING, ICONIP 2019, PT V, 2019, 1143 : 223 - 230
  • [6] Self-Supervised Reinforcement Learning for Active Object Detection
    Fang, Fen
    Liang, Wenyu
    Wu, Yan
    Xu, Qianli
    Lim, Joo-Hwee
    [J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (04): : 10224 - 10231
  • [7] Self-Supervised Attention-Aware Reinforcement Learning
    Wu, Haiping
    Khetarpa, Khimya
    Precup, Doina
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 10311 - 10319
  • [8] Detecting galaxy tidal features using self-supervised representation learning
    Desmons, Alice
    Brough, Sarah
    Lanusse, Francois
    [J]. MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 2024, 531 (04) : 4070 - 4084
  • [9] VICRegL: Self-Supervised Learning of Local Visual Features
    Bardes, Adrien
    Ponce, Jean
    LeCun, Yann
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [10] Self-Supervised Imitation for Offline Reinforcement Learning With Hindsight Relabeling
    Yu, Xudong
    Bai, Chenjia
    Wang, Changhong
    Yu, Dengxiu
    Chen, C. L. Philip
    Wang, Zhen
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2023, 53 (12): : 7732 - 7743