Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement

被引:0
|
作者
Barreto, Andre [1 ]
Borsa, Diana [1 ]
Quan, John [1 ]
Schaul, Tom [1 ]
Silver, David [1 ]
Hessel, Matteo [1 ]
Mankowitz, Daniel [1 ]
Zidek, Augustin [1 ]
Munos, Remi [1 ]
机构
[1] DeepMind, London, England
关键词
NETWORKS; GAME; GO;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The ability to transfer skills across tasks has the potential to scale up reinforcement learning (RL) agents to environments currently out of reach. Recently, a framework based on two ideas, successor features (SFs) and generalised policy improvement (GPI), has been introduced as a principled way of transferring skills. In this paper we extend the SF&GPI framework in two ways. One of the basic assumptions underlying the original formulation of SF&GPI is that rewards for all tasks of interest can be computed as linear combinations of a fixed set of features. We relax this constraint and show that the theoretical guarantees supporting the framework can be extended to any set of tasks that only differ in the reward function. Our second contribution is to show that one can use the reward functions themselves as features for future tasks, without any loss of expressiveness, thus removing the need to specify a set of features beforehand. This makes it possible to combine SF&GPI with deep learning in a more stable way. We empirically verify this claim on a complex 3D environment where observations are images from a first-person perspective. We show that the transfer promoted by SF&GPI leads to very good policies on unseen tasks almost instantaneously. We also describe how to learn policies specialised to the new tasks in a way that allows them to be added to the agent's set of skills, and thus be reused in the future.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] Deep reinforcement learning and adaptive policy transfer for generalizable well control optimization
    Wang, Zhongzheng
    Zhang, Kai
    Zhang, Jinding
    Chen, Guodong
    Ma, Xiaopeng
    Xin, Guojing
    Kang, Jinzheng
    Zhao, Hanjun
    Yang, Yongfei
    [J]. JOURNAL OF PETROLEUM SCIENCE AND ENGINEERING, 2022, 217
  • [22] Deep reinforcement learning and adaptive policy transfer for generalizable well control optimization
    Wang, Zhongzheng
    Zhang, Kai
    Zhang, Jinding
    Chen, Guodong
    Ma, Xiaopeng
    Xin, Guojing
    Kang, Jinzheng
    Zhao, Hanjun
    Yang, Yongfei
    [J]. JOURNAL OF PETROLEUM SCIENCE AND ENGINEERING, 2022, 217
  • [23] Constrained Policy Improvement for Efficient Reinforcement Learning
    Sarafian, Elad
    Tamar, Aviv
    Kraus, Sarit
    [J]. PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 2863 - 2871
  • [24] Deep Reinforcement Learning for Structural Model Updating Using Transfer Learning Mechanism
    Pang, Issac Kwok-Tai
    Gao, Yuqing
    Mosalam, Khalid M.
    [J]. COMPUTING IN CIVIL ENGINEERING 2023-VISUALIZATION, INFORMATION MODELING, AND SIMULATION, 2024, : 364 - 371
  • [25] Optimization of configuration of corrugated airfoil using deep reinforcement learning and transfer learning
    Noda, T.
    Okabayashi, K.
    Kimura, S.
    Takeuchi, S.
    Kajishima, T.
    [J]. AIP ADVANCES, 2023, 13 (03)
  • [26] Reaching Pruning Locations in a Vine Using a Deep Reinforcement Learning Policy
    Yandun, Francisco
    Parhar, Tanvir
    Silwal, Abhisesh
    Clifford, David
    Yuan, Zhiqiang
    Levine, Gabriella
    Yaroshenko, Sergey
    Kantor, George
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 2400 - 2406
  • [27] Probabilistic Policy Blending for Shared Autonomy using Deep Reinforcement Learning
    Singh, Saurav
    Heard, Jamison
    [J]. 2023 32ND IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, RO-MAN, 2023, : 1537 - 1544
  • [28] Diversity Evolutionary Policy Deep Reinforcement Learning
    Liu, Jian
    Feng, Liming
    [J]. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2021, 2021
  • [29] DECAF: Deep Case-based Policy Inference for knowledge transfer in Reinforcement Learning
    Glatt, Ruben
    Da Silva, Felipe Leno
    da Costa Bianchi, Reinaldo Augusto
    Reali Costa, Anna Helena
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2020, 156
  • [30] An Off-Policy Trust Region Policy Optimization Method With Monotonic Improvement Guarantee for Deep Reinforcement Learning
    Meng, Wenjia
    Zheng, Qian
    Shi, Yue
    Pan, Gang
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (05) : 2223 - 2235