Universal Off-Policy Evaluation

被引:0
|
作者
Chandak, Yash [1 ]
Niekum, Scott [2 ]
da Silva, Bruno Castro [1 ]
Learned-Miller, Erik [1 ]
Brunskill, Emma [3 ]
Thomas, Philip S. [1 ]
机构
[1] Univ Massachusetts, Amherst, MA 01003 USA
[2] Univ Texas Austin, Austin, TX 78712 USA
[3] Stanford Univ, Stanford, CA 94305 USA
基金
美国国家科学基金会;
关键词
MARKOV DECISION-PROCESSES; WILD BOOTSTRAP; VARIANCE; INFERENCE; BOUNDS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
When faced with sequential decision-making problems, it is often useful to be able to predict what would happen if decisions were made using a new policy. Those predictions must often be based on data collected under some previously used decision-making rule. Many previous methods enable such off-policy (or counterfactual) estimation of the expected value of a performance measure called the return. In this paper, we take the first steps towards a universal off-policy estimator (UnO)-one that provides off-policy estimates and high-confidence bounds for any parameter of the return distribution. We use UnO for estimating and simultaneously bounding the mean, variance, quantiles/median, inter-quantile range, CVaR, and the entire cumulative distribution of returns. Finally, we also discuss UnO's applicability in various settings, including fully observable, partially observable (i.e., with unobserved confounders), Markovian, non-Markovian, stationary, smoothly non-stationary, and discrete distribution shifts.
引用
收藏
页数:16
相关论文
共 50 条
  • [31] Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation
    Hanna, Josiah P.
    Stone, Peter
    Niekum, Scott
    THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4933 - 4934
  • [32] Balanced Off-Policy Evaluation in General Action Spaces
    Sondhi, Arjun
    Arbour, David
    Dimmery, Drew
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108
  • [33] More Robust Doubly Robust Off-policy Evaluation
    Farajtabar, Mehrdad
    Chow, Yinlam
    Ghavamzadeh, Mohammad
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [34] Combining Parametric and Nonparametric Models for Off-Policy Evaluation
    Gottesman, Omer
    Liu, Yao
    Sussex, Scott
    Brunskill, Emma
    Doshi-Velez, Finale
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [35] Off-policy Policy Evaluation For Sequential Decisions Under Unobserved Confounding
    Namkoong, Hongseok
    Keramati, Ramtin
    Yadlowsky, Steve
    Brunskill, Emma
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [36] Accountable Off-Policy Evaluation With Kernel Bellman Statistics
    Feng, Yihao
    Ren, Tongzheng
    Tang, Ziyang
    Liu, Qiang
    25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019), 2019,
  • [37] Research on Off-Policy Evaluation in Reinforcement Learning: A Survey
    Wang S.-R.
    Niu W.-J.
    Tong E.-D.
    Chen T.
    Li H.
    Tian Y.-Z.
    Liu J.-Q.
    Han Z.
    Li Y.-D.
    Jisuanji Xuebao/Chinese Journal of Computers, 2022, 45 (09): : 1926 - 1945
  • [38] Accountable Off-Policy Evaluation With Kernel Bellman Statistics
    Feng, Yihao
    Ren, Tongzheng
    Tang, Ziyang
    Liu, Qiang
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [39] Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning
    Thomas, Philip S.
    Brunskill, Emma
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [40] Off-Policy Proximal Policy Optimization
    Meng, Wenjia
    Zheng, Qian
    Pan, Gang
    Yin, Yilong
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 8, 2023, : 9162 - 9170