Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning

被引：0

作者：

Kallus, Nathan ^{[1
]}

Uehara, Masatoshi ^{[2
]}

机构：

[1] Cornell Univ, New York, NY 10021 USA

[2] Harvard Univ, Cambrdige, MA 02138 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019) | 2019年 / 32卷

基金：

美国国家科学基金会;

关键词：

ROBUSTNESS;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Off-policy evaluation (OPE) in both contextual bandits and reinforcement learning allows one to evaluate novel decision policies without needing to conduct exploration, which is often costly or otherwise infeasible. The problem's importance has attracted many proposed solutions, including importance sampling (IS), self-normalized IS (SNIS), and doubly robust (DR) estimates. DR and its variants ensure semiparametric local efficiency if Q-functions are well-specified, but if they are not they can be worse than both IS and SNIS. It also does not enjoy SNIS's inherent stability and boundedness. We propose new estimators for OPE based on empirical likelihood that are always more efficient than IS, SNIS, and DR and satisfy the same stability and boundedness properties as SNIS. On the way, we categorize various properties and classify existing estimators by them. Besides the theoretical guarantees, empirical studies suggest the new estimators provide advantages.

引用

页数：10

共 50 条

[1] Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning
Thomas, Philip S.
Brunskill, Emma
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
[2] Asymptotically Efficient Off-Policy Evaluation for Tabular Reinforcement Learning
Yin, Ming
Wang, Yu-Xiang
[J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108
[3] Representations for Stable Off-Policy Reinforcement Learning
Ghosh, Dibya
Bellemare, Marc G.
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
[4] Double Reinforcement Learning for Efficient and Robust Off-Policy Evaluation
Kallus, Nathan
Uehara, Masatoshi
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
[5] Representations for Stable Off-Policy Reinforcement Learning
Ghosh, Dibya
Bellemare, Marc G.
[J]. 25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019), 2019,
[6] Safe and efficient off-policy reinforcement learning
Munos, Remi
Stepleton, Thomas
Harutyunyan, Anna
Bellemare, Marc G.
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
[7] A perspective on off-policy evaluation in reinforcement learning
Li, Lihong
[J]. FRONTIERS OF COMPUTER SCIENCE, 2019, 13 (05) : 911 - 912
[8] A perspective on off-policy evaluation in reinforcement learning
Lihong Li
[J]. Frontiers of Computer Science, 2019, 13 : 911 - 912
[9] Reliable Off-Policy Evaluation for Reinforcement Learning
Wang, Jie
Gao, Rui
Zha, Hongyuan
[J]. OPERATIONS RESEARCH, 2024, 72 (02) : 699 - 716
[10] Double Reinforcement Learning for Efficient Off-Policy Evaluation in Markov Decision Processes
Kallus, Nathan
Uehara, Masatoshi
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2020, 21

← 1 2 3 4 5 →