Universal Off-Policy Evaluation

被引:0
|
作者
Chandak, Yash [1 ]
Niekum, Scott [2 ]
da Silva, Bruno Castro [1 ]
Learned-Miller, Erik [1 ]
Brunskill, Emma [3 ]
Thomas, Philip S. [1 ]
机构
[1] Univ Massachusetts, Amherst, MA 01003 USA
[2] Univ Texas Austin, Austin, TX 78712 USA
[3] Stanford Univ, Stanford, CA 94305 USA
基金
美国国家科学基金会;
关键词
MARKOV DECISION-PROCESSES; WILD BOOTSTRAP; VARIANCE; INFERENCE; BOUNDS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
When faced with sequential decision-making problems, it is often useful to be able to predict what would happen if decisions were made using a new policy. Those predictions must often be based on data collected under some previously used decision-making rule. Many previous methods enable such off-policy (or counterfactual) estimation of the expected value of a performance measure called the return. In this paper, we take the first steps towards a universal off-policy estimator (UnO)-one that provides off-policy estimates and high-confidence bounds for any parameter of the return distribution. We use UnO for estimating and simultaneously bounding the mean, variance, quantiles/median, inter-quantile range, CVaR, and the entire cumulative distribution of returns. Finally, we also discuss UnO's applicability in various settings, including fully observable, partially observable (i.e., with unobserved confounders), Markovian, non-Markovian, stationary, smoothly non-stationary, and discrete distribution shifts.
引用
收藏
页数:16
相关论文
共 50 条
  • [41] Average-Reward Off-Policy Policy Evaluation with Function Approximation
    Zhang, Shangtong
    Wan, Yi
    Sutton, Richard S.
    Whiteson, Shimon
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [42] A Nonparametric Off-Policy Policy Gradient
    Tosatto, Samuele
    Carvalho, Joao
    Abdulsamad, Hany
    Peters, Jan
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108
  • [43] Boosted Off-Policy Learning
    London, Ben
    Lu, Levi
    Sandler, Ted
    Joachims, Thorsten
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206
  • [44] Supervised Off-Policy Ranking
    Jin, Yue
    Zhang, Yue
    Qin, Tao
    Zhang, Xudong
    Yuan, Jian
    Li, Houqiang
    Liu, Tie-Yan
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022, : 10323 - 10339
  • [45] Q(λ) with Off-Policy Corrections
    Harutyunyan, Anna
    Bellemare, Marc G.
    Stepleton, Tom
    Munos, Remi
    ALGORITHMIC LEARNING THEORY, (ALT 2016), 2016, 9925 : 305 - 320
  • [46] On the Relation between Policy Improvement and Off-Policy Minimum-Variance Policy Evaluation
    Metelli, Alberto Maria
    Meta, Samuele
    Restelli, Marcello
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2023, 216 : 1423 - 1433
  • [47] Asymptotically Efficient Off-Policy Evaluation for Tabular Reinforcement Learning
    Yin, Ming
    Wang, Yu-Xiang
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108
  • [48] Off-Policy Evaluation with Deficient Support Using Side Information
    Felicioni, Nicolo
    Dacrema, Maurizio Ferrari
    Restelli, Marcello
    Cremonesi, Paolo
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [49] Off-policy evaluation for tabular reinforcement learning with synthetic trajectories
    Weiwei Wang
    Yuqiang Li
    Xianyi Wu
    Statistics and Computing, 2024, 34
  • [50] Bootstrapping Fitted Q-Evaluation for Off-Policy Inference
    Hao, Botao
    Ji, Xiang
    Duan, Yaqi
    Lu, Hao
    Szepesvari, Csaba
    Wang, Mengdi
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139