Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning

被引：0

作者：

Thomas, Philip S.

Brunskill, Emma

机构：

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48 | 2016年 / 48卷

基金：

美国国家科学基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper we present a new way of predicting the performance of a reinforcement learning policy given historical data that may have been generated by a different policy. The ability to evaluate a policy from historical data is important for applications where the deployment of a bad policy can be dangerous or costly. We show empirically that our algorithm produces estimates that often have orders of magnitude lower mean squared error than existing methods-it makes more efficient use of the available data. Our new estimator is based on two advances: an extension of the doubly robust estimator (Jiang & Li, 2015), and a new way to mix between model based and importance sampling based estimates.

引用

页数：10

共 50 条

[1] Data-efficient Hindsight Off-policy Option Learning
Wulfmeier, Markus
Rao, Dushyant
Hafner, Roland
Lampe, Thomas
Abdolmaleki, Abbas
Hertweck, Tim
Neunert, Michael
Tirumala, Dhruva
Siegel, Noah
Heess, Nicolas
Riedmiller, Martin
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[2] Asymptotically Efficient Off-Policy Evaluation for Tabular Reinforcement Learning
Yin, Ming
Wang, Yu-Xiang
[J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108
[3] Double Reinforcement Learning for Efficient and Robust Off-Policy Evaluation
Kallus, Nathan
Uehara, Masatoshi
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
[4] Safe and efficient off-policy reinforcement learning
Munos, Remi
Stepleton, Thomas
Harutyunyan, Anna
Bellemare, Marc G.
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
[5] A perspective on off-policy evaluation in reinforcement learning
Li, Lihong
[J]. FRONTIERS OF COMPUTER SCIENCE, 2019, 13 (05) : 911 - 912
[6] A perspective on off-policy evaluation in reinforcement learning
Lihong Li
[J]. Frontiers of Computer Science, 2019, 13 : 911 - 912
[7] Robust On-Policy Sampling for Data-Efficient Policy Evaluation in Reinforcement Learning
Zhong, Rujie
Zhang, Duohan
Schafer, Lukas
Albrecht, Stefano V.
Hanna, Josiah P.
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[8] Reliable Off-Policy Evaluation for Reinforcement Learning
Wang, Jie
Gao, Rui
Zha, Hongyuan
[J]. OPERATIONS RESEARCH, 2024, 72 (02) : 699 - 716
[9] Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning
Kallus, Nathan
Uehara, Masatoshi
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[10] Flexible Data Augmentation in Off-Policy Reinforcement Learning
Rak, Alexandra
Skrynnik, Alexey
Panov, Aleksandr I.
[J]. ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING (ICAISC 2021), PT I, 2021, 12854 : 224 - 235

← 1 2 3 4 5 →