Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning

被引:0
|
作者
Thomas, Philip S.
Brunskill, Emma
机构
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we present a new way of predicting the performance of a reinforcement learning policy given historical data that may have been generated by a different policy. The ability to evaluate a policy from historical data is important for applications where the deployment of a bad policy can be dangerous or costly. We show empirically that our algorithm produces estimates that often have orders of magnitude lower mean squared error than existing methods-it makes more efficient use of the available data. Our new estimator is based on two advances: an extension of the doubly robust estimator (Jiang & Li, 2015), and a new way to mix between model based and importance sampling based estimates.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Data-efficient Hindsight Off-policy Option Learning
    Wulfmeier, Markus
    Rao, Dushyant
    Hafner, Roland
    Lampe, Thomas
    Abdolmaleki, Abbas
    Hertweck, Tim
    Neunert, Michael
    Tirumala, Dhruva
    Siegel, Noah
    Heess, Nicolas
    Riedmiller, Martin
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [2] Asymptotically Efficient Off-Policy Evaluation for Tabular Reinforcement Learning
    Yin, Ming
    Wang, Yu-Xiang
    [J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108
  • [3] Double Reinforcement Learning for Efficient and Robust Off-Policy Evaluation
    Kallus, Nathan
    Uehara, Masatoshi
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [4] Safe and efficient off-policy reinforcement learning
    Munos, Remi
    Stepleton, Thomas
    Harutyunyan, Anna
    Bellemare, Marc G.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [5] A perspective on off-policy evaluation in reinforcement learning
    Li, Lihong
    [J]. FRONTIERS OF COMPUTER SCIENCE, 2019, 13 (05) : 911 - 912
  • [6] A perspective on off-policy evaluation in reinforcement learning
    Lihong Li
    [J]. Frontiers of Computer Science, 2019, 13 : 911 - 912
  • [7] Robust On-Policy Sampling for Data-Efficient Policy Evaluation in Reinforcement Learning
    Zhong, Rujie
    Zhang, Duohan
    Schafer, Lukas
    Albrecht, Stefano V.
    Hanna, Josiah P.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [8] Reliable Off-Policy Evaluation for Reinforcement Learning
    Wang, Jie
    Gao, Rui
    Zha, Hongyuan
    [J]. OPERATIONS RESEARCH, 2024, 72 (02) : 699 - 716
  • [9] Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning
    Kallus, Nathan
    Uehara, Masatoshi
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [10] Flexible Data Augmentation in Off-Policy Reinforcement Learning
    Rak, Alexandra
    Skrynnik, Alexey
    Panov, Aleksandr I.
    [J]. ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING (ICAISC 2021), PT I, 2021, 12854 : 224 - 235