Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning

被引:0
|
作者
Thomas, Philip S.
Brunskill, Emma
机构
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we present a new way of predicting the performance of a reinforcement learning policy given historical data that may have been generated by a different policy. The ability to evaluate a policy from historical data is important for applications where the deployment of a bad policy can be dangerous or costly. We show empirically that our algorithm produces estimates that often have orders of magnitude lower mean squared error than existing methods-it makes more efficient use of the available data. Our new estimator is based on two advances: an extension of the doubly robust estimator (Jiang & Li, 2015), and a new way to mix between model based and importance sampling based estimates.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Unified Off-Policy Learning to Rank: a Reinforcement Learning Perspective
    Zhang, Zeyu
    Su, Yi
    Yuan, Hui
    Wu, Yiran
    Balasubramanian, Rishab
    Wu, Qingyun
    Wang, Huazheng
    Wang, Mengdi
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [42] Towards Optimal Off-Policy Evaluation for Reinforcement Learning with Marginalized Importance Sampling
    Xie, Tengyang
    Ma, Yifei
    Wang, Yu-Xiang
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [43] Efficiently Breaking the Curse of Horizon in Off-Policy Evaluation with Double Reinforcement Learning
    Kallus, Nathan
    Uehara, Masatoshi
    [J]. OPERATIONS RESEARCH, 2022, 70 (06) : 3282 - 3302
  • [44] More Efficient Off-Policy Evaluation through Regularized Targeted Learning
    Bibaut, Aurelien F.
    Malenica, Ivana
    Vlassis, Nikos
    van der Laan, Mark J.
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [45] Off-policy Evaluation in Infinite-horizon Reinforcement Learning with Latent Confounders
    Bennett, Andrew
    Kallus, Nathan
    Li, Lihong
    Mousavi, Ali
    [J]. 24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [46] Learning Action Embeddings for Off-Policy Evaluation
    Cief, Matej
    Golebiowski, Jacek
    Schmidt, Philipp
    Abedjan, Ziawasch
    Bekasov, Artur
    [J]. ADVANCES IN INFORMATION RETRIEVAL, ECIR 2024, PT I, 2024, 14608 : 108 - 122
  • [47] Distributed off-Policy Actor-Critic Reinforcement Learning with Policy Consensus
    Zhang, Yan
    Zavlanos, Michael M.
    [J]. 2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 4674 - 4679
  • [48] Statistically Efficient Off-Policy Policy Gradients
    Kallus, Nathan
    Uehara, Masatoshi
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [49] OPIRL: Sample Efficient Off-Policy Inverse Reinforcement Learning via Distribution Matching
    Hoshino, Hana
    Ota, Kei
    Kanezaki, Asako
    Yokota, Rio
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2022), 2022,
  • [50] Sample-Efficient Model-Free Reinforcement Learning with Off-Policy Critics
    Steckelmacher, Denis
    Plisnier, Helene
    Roijers, Diederik M.
    Nowe, Ann
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2019, PT III, 2020, 11908 : 19 - 34