Universal Off-Policy Evaluation

被引：0

作者：

Chandak, Yash ^{[1
]}

Niekum, Scott ^{[2
]}

da Silva, Bruno Castro ^{[1
]}

Learned-Miller, Erik ^{[1
]}

Brunskill, Emma ^{[3
]}

Thomas, Philip S. ^{[1
]}

机构：

[1] Univ Massachusetts, Amherst, MA 01003 USA

[2] Univ Texas Austin, Austin, TX 78712 USA

[3] Stanford Univ, Stanford, CA 94305 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021) | 2021年

基金：

美国国家科学基金会;

关键词：

MARKOV DECISION-PROCESSES; WILD BOOTSTRAP; VARIANCE; INFERENCE; BOUNDS;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

When faced with sequential decision-making problems, it is often useful to be able to predict what would happen if decisions were made using a new policy. Those predictions must often be based on data collected under some previously used decision-making rule. Many previous methods enable such off-policy (or counterfactual) estimation of the expected value of a performance measure called the return. In this paper, we take the first steps towards a universal off-policy estimator (UnO)-one that provides off-policy estimates and high-confidence bounds for any parameter of the return distribution. We use UnO for estimating and simultaneously bounding the mean, variance, quantiles/median, inter-quantile range, CVaR, and the entire cumulative distribution of returns. Finally, we also discuss UnO's applicability in various settings, including fully observable, partially observable (i.e., with unobserved confounders), Markovian, non-Markovian, stationary, smoothly non-stationary, and discrete distribution shifts.

引用

页数：16

共 50 条

[41] Average-Reward Off-Policy Policy Evaluation with Function Approximation
Zhang, Shangtong
Wan, Yi
Sutton, Richard S.
Whiteson, Shimon
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[42] A Nonparametric Off-Policy Policy Gradient
Tosatto, Samuele
Carvalho, Joao
Abdulsamad, Hany
Peters, Jan
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108
[43] Boosted Off-Policy Learning
London, Ben
Lu, Levi
Sandler, Ted
Joachims, Thorsten
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206
[44] Supervised Off-Policy Ranking
Jin, Yue
Zhang, Yue
Qin, Tao
Zhang, Xudong
Yuan, Jian
Li, Houqiang
Liu, Tie-Yan
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022, : 10323 - 10339
[45] Q(λ) with Off-Policy Corrections
Harutyunyan, Anna
Bellemare, Marc G.
Stepleton, Tom
Munos, Remi
ALGORITHMIC LEARNING THEORY, (ALT 2016), 2016, 9925 : 305 - 320
[46] On the Relation between Policy Improvement and Off-Policy Minimum-Variance Policy Evaluation
Metelli, Alberto Maria
Meta, Samuele
Restelli, Marcello
UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2023, 216 : 1423 - 1433
[47] Asymptotically Efficient Off-Policy Evaluation for Tabular Reinforcement Learning
Yin, Ming
Wang, Yu-Xiang
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108
[48] Off-Policy Evaluation with Deficient Support Using Side Information
Felicioni, Nicolo
Dacrema, Maurizio Ferrari
Restelli, Marcello
Cremonesi, Paolo
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[49] Off-policy evaluation for tabular reinforcement learning with synthetic trajectories
Weiwei Wang
Yuqiang Li
Xianyi Wu
Statistics and Computing, 2024, 34
[50] Bootstrapping Fitted Q-Evaluation for Off-Policy Inference
Hao, Botao
Ji, Xiang
Duan, Yaqi
Lu, Hao
Szepesvari, Csaba
Wang, Mengdi
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139

← 1 2 3 4 5 →