Universal Off-Policy Evaluation

被引:0
|
作者
Chandak, Yash [1 ]
Niekum, Scott [2 ]
da Silva, Bruno Castro [1 ]
Learned-Miller, Erik [1 ]
Brunskill, Emma [3 ]
Thomas, Philip S. [1 ]
机构
[1] Univ Massachusetts, Amherst, MA 01003 USA
[2] Univ Texas Austin, Austin, TX 78712 USA
[3] Stanford Univ, Stanford, CA 94305 USA
基金
美国国家科学基金会;
关键词
MARKOV DECISION-PROCESSES; WILD BOOTSTRAP; VARIANCE; INFERENCE; BOUNDS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
When faced with sequential decision-making problems, it is often useful to be able to predict what would happen if decisions were made using a new policy. Those predictions must often be based on data collected under some previously used decision-making rule. Many previous methods enable such off-policy (or counterfactual) estimation of the expected value of a performance measure called the return. In this paper, we take the first steps towards a universal off-policy estimator (UnO)-one that provides off-policy estimates and high-confidence bounds for any parameter of the return distribution. We use UnO for estimating and simultaneously bounding the mean, variance, quantiles/median, inter-quantile range, CVaR, and the entire cumulative distribution of returns. Finally, we also discuss UnO's applicability in various settings, including fully observable, partially observable (i.e., with unobserved confounders), Markovian, non-Markovian, stationary, smoothly non-stationary, and discrete distribution shifts.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Off-Policy Evaluation via Off-Policy Classification
    Irpan, Alex
    Rao, Kanishka
    Bousmalis, Konstantinos
    Harris, Chris
    Ibarz, Julian
    Levine, Sergey
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [2] Off-Policy Evaluation for Human Feedback
    Gao, Qitong
    Gao, Ge
    Dong, Juncheng
    Tarokh, Vahid
    Chi, Min
    Pajic, Miroslav
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [3] Off-policy evaluation for slate recommendation
    Swaminathan, Adith
    Krishnamurthy, Akshay
    Agarwal, Alekh
    Dudik, Miroslav
    Langford, John
    Jose, Damien
    Zitouni, Imed
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [4] High Confidence Off-Policy Evaluation
    Thomas, Philip S.
    Theocharous, Georgios
    Ghavamzadeh, Mohammad
    PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2015, : 3000 - 3006
  • [5] Evaluating the Robustness of Off-Policy Evaluation
    Saito, Yuta
    Udagawa, Takuma
    Kiyohara, Haruka
    Mogi, Kazuki
    Narita, Yusuke
    Tateno, Kei
    15TH ACM CONFERENCE ON RECOMMENDER SYSTEMS (RECSYS 2021), 2021, : 114 - 123
  • [6] State Relevance for Off-Policy Evaluation
    Shen, Simon P.
    Ma, Yecheng Jason
    Gottesman, Omer
    Doshi-Velez, Finale
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [7] Representation Balancing MDPs for Off-Policy Policy Evaluation
    Liu, Yao
    Gottesman, Omer
    Raghu, Aniruddh
    Komorowski, Matthieu
    Faisal, Aldo
    Doshi-Velez, Finale
    Brunskill, Emma
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [8] Off-Policy Evaluation via the Regularized Lagrangian
    Yang, Mengjiao
    Nachum, Ofir
    Dai, Bo
    Li, Lihong
    Schuurmans, Dale
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [9] Consistent On-Line Off-Policy Evaluation
    Hallak, Assaf
    Mannor, Shie
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [10] IntOPE: Off-Policy Evaluation in the Presence of Interference
    Bai, Yuqi
    Zhao, Ziyu
    Zhu, Minqin
    Kuang, Kun
    arXiv, 2024,