Data-efficient Hindsight Off-policy Option Learning

被引:0
|
作者
Wulfmeier, Markus [1 ]
Rao, Dushyant [1 ]
Hafner, Roland [1 ]
Lampe, Thomas [1 ]
Abdolmaleki, Abbas [1 ]
Hertweck, Tim [1 ]
Neunert, Michael [1 ]
Tirumala, Dhruva [1 ]
Siegel, Noah [1 ]
Heess, Nicolas [1 ]
Riedmiller, Martin [1 ]
机构
[1] DeepMind, London, England
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce Hindsight Off-policy Options (H02), a data-efficient option learning algorithm. Given any trajectory, HO2 infers likely option choices and backpropagates through the dynamic programming inference procedure to robustly train all policy components off-policy and end-to-end. The approach outperforms existing option learning methods on common benchmarks. To better understand the option framework and disentangle benefits from both temporal and action abstraction, we evaluate ablations with flat policies and mixture policies with comparable optimization. The results highlight the importance of both types of abstraction as well as off-policy training and trust-region constraints, particularly in challenging, simulated 3D robot manipulation tasks from raw pixel inputs. Finally, we intuitively adapt the inference step to investigate the effect of increased temporal abstraction on training with pre-trained options and from scratch.
引用
收藏
页数:11
相关论文
共 50 条
  • [11] Learning with Options that Terminate Off-Policy
    Harutyunyan, Anna
    Vrancx, Peter
    Bacon, Pierre-Luc
    Precup, Doina
    Nowe, Ann
    [J]. THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 3173 - 3182
  • [12] Online Learning with Off-Policy Feedback
    Gabbianelli, Germano
    Neu, Gergely
    Papini, Matteo
    [J]. INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 201, 2023, 201 : 620 - 641
  • [13] Off-policy Learning for Multiple Loggers
    He, Li
    Xia, Long
    Zeng, Wei
    Ma, Zhi-Ming
    Zhao, Yihong
    Yin, Dawei
    [J]. KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, : 1184 - 1193
  • [14] Exponential Smoothing for Off-Policy Learning
    Aouali, Imad
    Brunel, Victor-Emmanuel
    Rohde, David
    Korba, Anna
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202
  • [15] Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning
    Kallus, Nathan
    Uehara, Masatoshi
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [16] An efficient and lightweight off-policy actor–critic reinforcement learning framework
    Zhang, Huaqing
    Ma, Hongbin
    Zhang, Xiaofei
    Mersha, Bemnet Wondimagegnehu
    Wang, Li
    Jin, Ying
    [J]. Applied Soft Computing, 2024, 163
  • [17] More Efficient Off-Policy Evaluation through Regularized Targeted Learning
    Bibaut, Aurelien F.
    Malenica, Ivana
    Vlassis, Nikos
    van der Laan, Mark J.
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [18] Off-Policy Evaluation via Off-Policy Classification
    Irpan, Alex
    Rao, Kanishka
    Bousmalis, Konstantinos
    Harris, Chris
    Ibarz, Julian
    Levine, Sergey
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [19] Robust On-Policy Sampling for Data-Efficient Policy Evaluation in Reinforcement Learning
    Zhong, Rujie
    Zhang, Duohan
    Schafer, Lukas
    Albrecht, Stefano V.
    Hanna, Josiah P.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [20] Double Reinforcement Learning for Efficient Off-Policy Evaluation in Markov Decision Processes
    Kallus, Nathan
    Uehara, Masatoshi
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2020, 21