Universal Off-Policy Evaluation

被引：0

作者：

Chandak, Yash ^{[1
]}

Niekum, Scott ^{[2
]}

da Silva, Bruno Castro ^{[1
]}

Learned-Miller, Erik ^{[1
]}

Brunskill, Emma ^{[3
]}

Thomas, Philip S. ^{[1
]}

机构：

[1] Univ Massachusetts, Amherst, MA 01003 USA

[2] Univ Texas Austin, Austin, TX 78712 USA

[3] Stanford Univ, Stanford, CA 94305 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021) | 2021年

基金：

美国国家科学基金会;

关键词：

MARKOV DECISION-PROCESSES; WILD BOOTSTRAP; VARIANCE; INFERENCE; BOUNDS;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

When faced with sequential decision-making problems, it is often useful to be able to predict what would happen if decisions were made using a new policy. Those predictions must often be based on data collected under some previously used decision-making rule. Many previous methods enable such off-policy (or counterfactual) estimation of the expected value of a performance measure called the return. In this paper, we take the first steps towards a universal off-policy estimator (UnO)-one that provides off-policy estimates and high-confidence bounds for any parameter of the return distribution. We use UnO for estimating and simultaneously bounding the mean, variance, quantiles/median, inter-quantile range, CVaR, and the entire cumulative distribution of returns. Finally, we also discuss UnO's applicability in various settings, including fully observable, partially observable (i.e., with unobserved confounders), Markovian, non-Markovian, stationary, smoothly non-stationary, and discrete distribution shifts.

引用

页数：16

共 50 条

[31] Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation
Hanna, Josiah P.
Stone, Peter
Niekum, Scott
THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4933 - 4934
[32] Balanced Off-Policy Evaluation in General Action Spaces
Sondhi, Arjun
Arbour, David
Dimmery, Drew
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108
[33] More Robust Doubly Robust Off-policy Evaluation
Farajtabar, Mehrdad
Chow, Yinlam
Ghavamzadeh, Mohammad
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
[34] Combining Parametric and Nonparametric Models for Off-Policy Evaluation
Gottesman, Omer
Liu, Yao
Sussex, Scott
Brunskill, Emma
Doshi-Velez, Finale
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
[35] Off-policy Policy Evaluation For Sequential Decisions Under Unobserved Confounding
Namkoong, Hongseok
Keramati, Ramtin
Yadlowsky, Steve
Brunskill, Emma
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[36] Accountable Off-Policy Evaluation With Kernel Bellman Statistics
Feng, Yihao
Ren, Tongzheng
Tang, Ziyang
Liu, Qiang
25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019), 2019,
[37] Research on Off-Policy Evaluation in Reinforcement Learning: A Survey
Wang S.-R.
Niu W.-J.
Tong E.-D.
Chen T.
Li H.
Tian Y.-Z.
Liu J.-Q.
Han Z.
Li Y.-D.
Jisuanji Xuebao/Chinese Journal of Computers, 2022, 45 (09): : 1926 - 1945
[38] Accountable Off-Policy Evaluation With Kernel Bellman Statistics
Feng, Yihao
Ren, Tongzheng
Tang, Ziyang
Liu, Qiang
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
[39] Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning
Thomas, Philip S.
Brunskill, Emma
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
[40] Off-Policy Proximal Policy Optimization
Meng, Wenjia
Zheng, Qian
Pan, Gang
Yin, Yilong
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 8, 2023, : 9162 - 9170

← 1 2 3 4 5 →