Data-efficient Hindsight Off-policy Option Learning

被引:0
|
作者
Wulfmeier, Markus [1 ]
Rao, Dushyant [1 ]
Hafner, Roland [1 ]
Lampe, Thomas [1 ]
Abdolmaleki, Abbas [1 ]
Hertweck, Tim [1 ]
Neunert, Michael [1 ]
Tirumala, Dhruva [1 ]
Siegel, Noah [1 ]
Heess, Nicolas [1 ]
Riedmiller, Martin [1 ]
机构
[1] DeepMind, London, England
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce Hindsight Off-policy Options (H02), a data-efficient option learning algorithm. Given any trajectory, HO2 infers likely option choices and backpropagates through the dynamic programming inference procedure to robustly train all policy components off-policy and end-to-end. The approach outperforms existing option learning methods on common benchmarks. To better understand the option framework and disentangle benefits from both temporal and action abstraction, we evaluate ablations with flat policies and mixture policies with comparable optimization. The results highlight the importance of both types of abstraction as well as off-policy training and trust-region constraints, particularly in challenging, simulated 3D robot manipulation tasks from raw pixel inputs. Finally, we intuitively adapt the inference step to investigate the effect of increased temporal abstraction on training with pre-trained options and from scratch.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] Representations for Stable Off-Policy Reinforcement Learning
    Ghosh, Dibya
    Bellemare, Marc G.
    [J]. 25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019), 2019,
  • [42] Deep Off-Policy Iterative Learning Control
    Gurumurthy, Swaminathan
    Kolter, J. Zico
    Manchester, Zachary
    [J]. LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 211, 2023, 211
  • [43] Marginalized Operators for Off-policy Reinforcement Learning
    Tang, Yunhao
    Rowland, Mark
    Munos, Remi
    Valko, Michal
    [J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151 : 655 - 679
  • [44] Off-Policy Differentiable Logic Reinforcement Learning
    Zhang, Li
    Li, Xin
    Wang, Mingzhong
    Tian, Andong
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2021: RESEARCH TRACK, PT II, 2021, 12976 : 617 - 632
  • [45] Off-Policy Shaping Ensembles in Reinforcement Learning
    Harutyunyan, Anna
    Brys, Tim
    Vrancx, Peter
    Nowe, Ann
    [J]. 21ST EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (ECAI 2014), 2014, 263 : 1021 - 1022
  • [46] Data-Efficient Control Policy Search using Residual Dynamics Learning
    Saveriano, Matteo
    Yin, Yuchao
    Falco, Pietro
    Lee, Dongheui
    [J]. 2017 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2017, : 4709 - 4715
  • [47] Efficient Off-Policy Q-Learning for Data-Based Discrete-Time LQR Problems
    Lopez, Victor G.
    Alsalti, Mohammad
    Mueller, Matthias A.
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2023, 68 (05) : 2922 - 2933
  • [48] Off-policy and on-policy reinforcement learning with the Tsetlin machine
    Saeed Rahimi Gorji
    Ole-Christoffer Granmo
    [J]. Applied Intelligence, 2023, 53 : 8596 - 8613
  • [49] Batch Reinforcement Learning With a Nonparametric Off-Policy Policy Gradient
    Tosatto, Samuele
    Carvalho, Joao
    Peters, Jan
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (10) : 5996 - 6010
  • [50] Continuous Value Assignment: A Doubly Robust Data Augmentation for Off-Policy Learning
    Lin, Junfan
    Huang, Zhongzhan
    Wang, Keze
    Liu, Lingbo
    Lin, Liang
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,