Universal Off-Policy Evaluation

被引:0
|
作者
Chandak, Yash [1 ]
Niekum, Scott [2 ]
da Silva, Bruno Castro [1 ]
Learned-Miller, Erik [1 ]
Brunskill, Emma [3 ]
Thomas, Philip S. [1 ]
机构
[1] Univ Massachusetts, Amherst, MA 01003 USA
[2] Univ Texas Austin, Austin, TX 78712 USA
[3] Stanford Univ, Stanford, CA 94305 USA
基金
美国国家科学基金会;
关键词
MARKOV DECISION-PROCESSES; WILD BOOTSTRAP; VARIANCE; INFERENCE; BOUNDS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
When faced with sequential decision-making problems, it is often useful to be able to predict what would happen if decisions were made using a new policy. Those predictions must often be based on data collected under some previously used decision-making rule. Many previous methods enable such off-policy (or counterfactual) estimation of the expected value of a performance measure called the return. In this paper, we take the first steps towards a universal off-policy estimator (UnO)-one that provides off-policy estimates and high-confidence bounds for any parameter of the return distribution. We use UnO for estimating and simultaneously bounding the mean, variance, quantiles/median, inter-quantile range, CVaR, and the entire cumulative distribution of returns. Finally, we also discuss UnO's applicability in various settings, including fully observable, partially observable (i.e., with unobserved confounders), Markovian, non-Markovian, stationary, smoothly non-stationary, and discrete distribution shifts.
引用
收藏
页数:16
相关论文
共 50 条
  • [21] Off-Policy Evaluation in Partially Observable Environments
    Tennenholtz, Guy
    Mannor, Shie
    Shalit, Uri
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 10276 - 10283
  • [22] On the Design of Estimators for Bandit Off-Policy Evaluation
    Vlassis, Nikos
    Bibaut, Aurelien
    Dimakopoulou, Maria
    Jebara, Tony
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [23] Data Poisoning Attacks on Off-Policy Policy Evaluation Methods
    Lobo, Elita
    Singh, Harvineet
    Petrik, Marek
    Rudin, Cynthia
    Lakkaraju, Himabindu
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, VOL 180, 2022, 180 : 1264 - 1274
  • [24] Off-Policy Evaluation with Policy-Dependent Optimization Response
    Guo, Wenshuo
    Jordan, Michael I.
    Zhou, Angela
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [25] Policy-Adaptive Estimator Selection for Off-Policy Evaluation
    Udagawa, Takuma
    Kiyohara, Haruka
    Narita, Yusuke
    Saito, Yuta
    Tateno, Kei
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 8, 2023, : 10025 - 10033
  • [26] Identification of Subgroups With Similar Benefits in Off-Policy Policy Evaluation
    Keramati, Ramtin
    Gottesman, Omer
    Celi, Leo Anthony
    Doshi-Velez, Finale
    Brunskill, Emma
    CONFERENCE ON HEALTH, INFERENCE, AND LEARNING, VOL 174, 2022, 174 : 397 - 410
  • [27] Minimax Value Interval for Off-Policy Evaluation and Policy Optimization
    Jiang, Nan
    Huang, Jiawei
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [28] Optimal and Adaptive Off-policy Evaluation in Contextual Bandits
    Wang, Yu-Xiang
    Agarwal, Alekh
    Dudik, Miroslav
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [29] Conformal Off-Policy Evaluation in Markov Decision Processes
    Foffano, Daniele
    Russo, Alessio
    Proutiere, Alexandre
    2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 3087 - 3094
  • [30] Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation
    Hanna, Josiah P.
    Stone, Peter
    Niekum, Scott
    AAMAS'17: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2017, : 538 - 546