Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning

被引：0

作者：

Thomas, Philip S.

Brunskill, Emma

机构：

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48 | 2016年 / 48卷

基金：

美国国家科学基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper we present a new way of predicting the performance of a reinforcement learning policy given historical data that may have been generated by a different policy. The ability to evaluate a policy from historical data is important for applications where the deployment of a bad policy can be dangerous or costly. We show empirically that our algorithm produces estimates that often have orders of magnitude lower mean squared error than existing methods-it makes more efficient use of the available data. Our new estimator is based on two advances: an extension of the doubly robust estimator (Jiang & Li, 2015), and a new way to mix between model based and importance sampling based estimates.

引用

页数：10

共 50 条

[41] Unified Off-Policy Learning to Rank: a Reinforcement Learning Perspective
Zhang, Zeyu
Su, Yi
Yuan, Hui
Wu, Yiran
Balasubramanian, Rishab
Wu, Qingyun
Wang, Huazheng
Wang, Mengdi
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[42] Towards Optimal Off-Policy Evaluation for Reinforcement Learning with Marginalized Importance Sampling
Xie, Tengyang
Ma, Yifei
Wang, Yu-Xiang
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[43] Efficiently Breaking the Curse of Horizon in Off-Policy Evaluation with Double Reinforcement Learning
Kallus, Nathan
Uehara, Masatoshi
[J]. OPERATIONS RESEARCH, 2022, 70 (06) : 3282 - 3302
[44] More Efficient Off-Policy Evaluation through Regularized Targeted Learning
Bibaut, Aurelien F.
Malenica, Ivana
Vlassis, Nikos
van der Laan, Mark J.
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
[45] Off-policy Evaluation in Infinite-horizon Reinforcement Learning with Latent Confounders
Bennett, Andrew
Kallus, Nathan
Li, Lihong
Mousavi, Ali
[J]. 24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
[46] Learning Action Embeddings for Off-Policy Evaluation
Cief, Matej
Golebiowski, Jacek
Schmidt, Philipp
Abedjan, Ziawasch
Bekasov, Artur
[J]. ADVANCES IN INFORMATION RETRIEVAL, ECIR 2024, PT I, 2024, 14608 : 108 - 122
[47] Distributed off-Policy Actor-Critic Reinforcement Learning with Policy Consensus
Zhang, Yan
Zavlanos, Michael M.
[J]. 2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 4674 - 4679
[48] Statistically Efficient Off-Policy Policy Gradients
Kallus, Nathan
Uehara, Masatoshi
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
[49] OPIRL: Sample Efficient Off-Policy Inverse Reinforcement Learning via Distribution Matching
Hoshino, Hana
Ota, Kei
Kanezaki, Asako
Yokota, Rio
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2022), 2022,
[50] Sample-Efficient Model-Free Reinforcement Learning with Off-Policy Critics
Steckelmacher, Denis
Plisnier, Helene
Roijers, Diederik M.
Nowe, Ann
[J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2019, PT III, 2020, 11908 : 19 - 34

← 1 2 3 4 5 →