Data-efficient Hindsight Off-policy Option Learning

被引:0
|
作者
Wulfmeier, Markus [1 ]
Rao, Dushyant [1 ]
Hafner, Roland [1 ]
Lampe, Thomas [1 ]
Abdolmaleki, Abbas [1 ]
Hertweck, Tim [1 ]
Neunert, Michael [1 ]
Tirumala, Dhruva [1 ]
Siegel, Noah [1 ]
Heess, Nicolas [1 ]
Riedmiller, Martin [1 ]
机构
[1] DeepMind, London, England
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce Hindsight Off-policy Options (H02), a data-efficient option learning algorithm. Given any trajectory, HO2 infers likely option choices and backpropagates through the dynamic programming inference procedure to robustly train all policy components off-policy and end-to-end. The approach outperforms existing option learning methods on common benchmarks. To better understand the option framework and disentangle benefits from both temporal and action abstraction, we evaluate ablations with flat policies and mixture policies with comparable optimization. The results highlight the importance of both types of abstraction as well as off-policy training and trust-region constraints, particularly in challenging, simulated 3D robot manipulation tasks from raw pixel inputs. Finally, we intuitively adapt the inference step to investigate the effect of increased temporal abstraction on training with pre-trained options and from scratch.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning
    Thomas, Philip S.
    Brunskill, Emma
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [2] Data-Efficient Off-Policy Learning for Distributed Optimal Tracking Control of HMAS With Unidentified Exosystem Dynamics
    Xu, Yong
    Wu, Zheng-Guang
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (03) : 3181 - 3190
  • [3] Safe and efficient off-policy reinforcement learning
    Munos, Remi
    Stepleton, Thomas
    Harutyunyan, Anna
    Bellemare, Marc G.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [4] Flexible Data Augmentation in Off-Policy Reinforcement Learning
    Rak, Alexandra
    Skrynnik, Alexey
    Panov, Aleksandr I.
    [J]. ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING (ICAISC 2021), PT I, 2021, 12854 : 224 - 235
  • [5] Provably Efficient Neural GTD Algorithm for Off-policy Learning
    Wai, Hoi-To
    Yang, Zhuoran
    Wang, Zhaoran
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [6] Asymptotically Efficient Off-Policy Evaluation for Tabular Reinforcement Learning
    Yin, Ming
    Wang, Yu-Xiang
    [J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108
  • [7] Double Reinforcement Learning for Efficient and Robust Off-Policy Evaluation
    Kallus, Nathan
    Uehara, Masatoshi
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [8] Efficient Off-policy Adversarial Imitation Learning with Imperfect Demonstrations
    Li, Jiangeng
    Zhao, Qishen
    Huang, Shuai
    Zuo, Guoyu
    [J]. PROCEEDINGS OF THE 33RD CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2021), 2021, : 1692 - 1697
  • [9] Statistically Efficient Off-Policy Policy Gradients
    Kallus, Nathan
    Uehara, Masatoshi
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [10] FORGETTING AND IMBALANCE IN ROBOT LIFELONG LEARNING WITH OFF-POLICY DATA
    Zhou, Wenxuan
    Bohez, Steven
    Humplik, Jan
    Abdolmaleki, Abbas
    Rao, Dushyant
    Haarnoja, Tuomas
    Heess, Nicolas
    [J]. CONFERENCE ON LIFELONG LEARNING AGENTS, VOL 199, 2022, 199