Data-efficient Hindsight Off-policy Option Learning

被引：0

作者：

Wulfmeier, Markus ^{[1
]}

Rao, Dushyant ^{[1
]}

Hafner, Roland ^{[1
]}

Lampe, Thomas ^{[1
]}

Abdolmaleki, Abbas ^{[1
]}

Hertweck, Tim ^{[1
]}

Neunert, Michael ^{[1
]}

Tirumala, Dhruva ^{[1
]}

Siegel, Noah ^{[1
]}

Heess, Nicolas ^{[1
]}

Riedmiller, Martin ^{[1
]}

机构：

[1] DeepMind, London, England

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139 | 2021年 / 139卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We introduce Hindsight Off-policy Options (H02), a data-efficient option learning algorithm. Given any trajectory, HO2 infers likely option choices and backpropagates through the dynamic programming inference procedure to robustly train all policy components off-policy and end-to-end. The approach outperforms existing option learning methods on common benchmarks. To better understand the option framework and disentangle benefits from both temporal and action abstraction, we evaluate ablations with flat policies and mixture policies with comparable optimization. The results highlight the importance of both types of abstraction as well as off-policy training and trust-region constraints, particularly in challenging, simulated 3D robot manipulation tasks from raw pixel inputs. Finally, we intuitively adapt the inference step to investigate the effect of increased temporal abstraction on training with pre-trained options and from scratch.

引用

页数：11

共 50 条

[1] Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning
Thomas, Philip S.
Brunskill, Emma
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
[2] Data-Efficient Off-Policy Learning for Distributed Optimal Tracking Control of HMAS With Unidentified Exosystem Dynamics
Xu, Yong
Wu, Zheng-Guang
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (03) : 3181 - 3190
[3] Safe and efficient off-policy reinforcement learning
Munos, Remi
Stepleton, Thomas
Harutyunyan, Anna
Bellemare, Marc G.
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
[4] Flexible Data Augmentation in Off-Policy Reinforcement Learning
Rak, Alexandra
Skrynnik, Alexey
Panov, Aleksandr I.
[J]. ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING (ICAISC 2021), PT I, 2021, 12854 : 224 - 235
[5] Provably Efficient Neural GTD Algorithm for Off-policy Learning
Wai, Hoi-To
Yang, Zhuoran
Wang, Zhaoran
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[6] Asymptotically Efficient Off-Policy Evaluation for Tabular Reinforcement Learning
Yin, Ming
Wang, Yu-Xiang
[J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108
[7] Double Reinforcement Learning for Efficient and Robust Off-Policy Evaluation
Kallus, Nathan
Uehara, Masatoshi
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
[8] Efficient Off-policy Adversarial Imitation Learning with Imperfect Demonstrations
Li, Jiangeng
Zhao, Qishen
Huang, Shuai
Zuo, Guoyu
[J]. PROCEEDINGS OF THE 33RD CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2021), 2021, : 1692 - 1697
[9] Statistically Efficient Off-Policy Policy Gradients
Kallus, Nathan
Uehara, Masatoshi
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
[10] FORGETTING AND IMBALANCE IN ROBOT LIFELONG LEARNING WITH OFF-POLICY DATA
Zhou, Wenxuan
Bohez, Steven
Humplik, Jan
Abdolmaleki, Abbas
Rao, Dushyant
Haarnoja, Tuomas
Heess, Nicolas
[J]. CONFERENCE ON LIFELONG LEARNING AGENTS, VOL 199, 2022, 199

← 1 2 3 4 5 →