Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning

被引:0
|
作者
Kallus, Nathan [1 ]
Uehara, Masatoshi [2 ]
机构
[1] Cornell Univ, New York, NY 10021 USA
[2] Harvard Univ, Cambrdige, MA 02138 USA
基金
美国国家科学基金会;
关键词
ROBUSTNESS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Off-policy evaluation (OPE) in both contextual bandits and reinforcement learning allows one to evaluate novel decision policies without needing to conduct exploration, which is often costly or otherwise infeasible. The problem's importance has attracted many proposed solutions, including importance sampling (IS), self-normalized IS (SNIS), and doubly robust (DR) estimates. DR and its variants ensure semiparametric local efficiency if Q-functions are well-specified, but if they are not they can be worse than both IS and SNIS. It also does not enjoy SNIS's inherent stability and boundedness. We propose new estimators for OPE based on empirical likelihood that are always more efficient than IS, SNIS, and DR and satisfy the same stability and boundedness properties as SNIS. On the way, we categorize various properties and classify existing estimators by them. Besides the theoretical guarantees, empirical studies suggest the new estimators provide advantages.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning
    Thomas, Philip S.
    Brunskill, Emma
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [2] Asymptotically Efficient Off-Policy Evaluation for Tabular Reinforcement Learning
    Yin, Ming
    Wang, Yu-Xiang
    [J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108
  • [3] Representations for Stable Off-Policy Reinforcement Learning
    Ghosh, Dibya
    Bellemare, Marc G.
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [4] Double Reinforcement Learning for Efficient and Robust Off-Policy Evaluation
    Kallus, Nathan
    Uehara, Masatoshi
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [5] Representations for Stable Off-Policy Reinforcement Learning
    Ghosh, Dibya
    Bellemare, Marc G.
    [J]. 25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019), 2019,
  • [6] Safe and efficient off-policy reinforcement learning
    Munos, Remi
    Stepleton, Thomas
    Harutyunyan, Anna
    Bellemare, Marc G.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [7] A perspective on off-policy evaluation in reinforcement learning
    Li, Lihong
    [J]. FRONTIERS OF COMPUTER SCIENCE, 2019, 13 (05) : 911 - 912
  • [8] A perspective on off-policy evaluation in reinforcement learning
    Lihong Li
    [J]. Frontiers of Computer Science, 2019, 13 : 911 - 912
  • [9] Reliable Off-Policy Evaluation for Reinforcement Learning
    Wang, Jie
    Gao, Rui
    Zha, Hongyuan
    [J]. OPERATIONS RESEARCH, 2024, 72 (02) : 699 - 716
  • [10] Double Reinforcement Learning for Efficient Off-Policy Evaluation in Markov Decision Processes
    Kallus, Nathan
    Uehara, Masatoshi
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2020, 21