Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement

被引:0
|
作者
Barreto, Andre [1 ]
Borsa, Diana [1 ]
Quan, John [1 ]
Schaul, Tom [1 ]
Silver, David [1 ]
Hessel, Matteo [1 ]
Mankowitz, Daniel [1 ]
Zidek, Augustin [1 ]
Munos, Remi [1 ]
机构
[1] DeepMind, London, England
关键词
NETWORKS; GAME; GO;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The ability to transfer skills across tasks has the potential to scale up reinforcement learning (RL) agents to environments currently out of reach. Recently, a framework based on two ideas, successor features (SFs) and generalised policy improvement (GPI), has been introduced as a principled way of transferring skills. In this paper we extend the SF&GPI framework in two ways. One of the basic assumptions underlying the original formulation of SF&GPI is that rewards for all tasks of interest can be computed as linear combinations of a fixed set of features. We relax this constraint and show that the theoretical guarantees supporting the framework can be extended to any set of tasks that only differ in the reward function. Our second contribution is to show that one can use the reward functions themselves as features for future tasks, without any loss of expressiveness, thus removing the need to specify a set of features beforehand. This makes it possible to combine SF&GPI with deep learning in a more stable way. We empirically verify this claim on a complex 3D environment where observations are images from a first-person perspective. We show that the transfer promoted by SF&GPI leads to very good policies on unseen tasks almost instantaneously. We also describe how to learn policies specialised to the new tasks in a way that allows them to be added to the agent's set of skills, and thus be reused in the future.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Successor Features for Transfer in Reinforcement Learning
    Barreto, Andre
    Dabney, Will
    Munos, Remi
    Hunt, Jonathan J.
    Schaul, Tom
    van Hasselt, Hado
    Silver, David
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [2] Transformed Successor Features for Transfer Reinforcement Learning
    Garces, Kiyoshige
    Xuan, Junyu
    Zuo, Hua
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE, AI 2023, PT II, 2024, 14472 : 298 - 309
  • [3] Risk-Aware Transfer in Reinforcement Learning using Successor Features
    Gimelfarb, Michael
    Barreto, Andre
    Sanner, Scott
    Lee, Chi-Guhn
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [4] Deep Reinforcement Learning with Successor Features for Navigation across Similar Environments
    Zhang, Jingwei
    Springenberg, Jost Tobias
    Boedecker, Joschka
    Burgard, Wolfram
    [J]. 2017 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2017, : 2371 - 2378
  • [5] Efficient Deep Reinforcement Learning via Policy-Extended Successor Feature Approximator
    Li, Yining
    Yang, Tianpei
    Hao, Jianye
    Zheng, Yan
    Tang, Hongyao
    [J]. DISTRIBUTED ARTIFICIAL INTELLIGENCE, DAI 2022, 2023, 13824 : 29 - 44
  • [6] Improvement of PMSM Control Using Reinforcement Learning Deep Deterministic Policy Gradient Agent
    Nicola, Marcel
    Nicola, Claudiu-Ionel
    [J]. 2021 21ST INTERNATIONAL SYMPOSIUM ON POWER ELECTRONICS (EE 2021), 2021,
  • [7] PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning
    Filos, Angelos
    Lyle, Clare
    Gal, Yarin
    Levine, Sergey
    Jaques, Natasha
    Farquhar, Gregory
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [8] Safety-Constrained Policy Transfer with Successor Features
    Feng, Zeyu
    Zhang, Bowen
    Bi, Jianxin
    Soh, Harold
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2023), 2023, : 7219 - 7225
  • [9] Efficient Deep Reinforcement Learning via Adaptive Policy Transfer
    Yang, Tianpei
    Hao, Jianye
    Meng, Zhaopeng
    Zhang, Zongzhang
    Hu, Yujing
    Chen, Yingfeng
    Fan, Changjie
    Wang, Weixun
    Liu, Wulong
    Wang, Zhaodong
    Peng, Jiajie
    [J]. PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 3094 - 3100
  • [10] Adaptable automation with modular deep reinforcement learning and policy transfer
    Raziei, Zohreh
    Moghaddam, Mohsen
    [J]. Engineering Applications of Artificial Intelligence, 2021, 103