High Confidence Off-Policy Evaluation

被引:0
|
作者
Thomas, Philip S. [1 ,2 ]
Theocharous, Georgios [1 ]
Ghavamzadeh, Mohammad [1 ,3 ]
机构
[1] Adobe Res, San Jose, CA 95110 USA
[2] Univ Massachusetts, Amherst, MA 01003 USA
[3] INRIA Lille, Lille, France
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many reinforcement learning algorithms use trajectories collected from the execution of one or more policies to propose a new policy. Because execution of a bad policy can be costly or dangerous, techniques for evaluating the performance of the new policy without requiring its execution have been of recent interest in industry. Such off-policy evaluation methods, which estimate the performance of a policy using trajectories collected from the execution of other policies, heretofore have not provided confidences regarding the accuracy of their estimates. In this paper we propose an off-policy method for computing a lower confidence bound on the expected return of a policy.
引用
收藏
页码:3000 / 3006
页数:7
相关论文
共 50 条
  • [11] Representation Balancing MDPs for Off-Policy Policy Evaluation
    Liu, Yao
    Gottesman, Omer
    Raghu, Aniruddh
    Komorowski, Matthieu
    Faisal, Aldo
    Doshi-Velez, Finale
    Brunskill, Emma
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [12] Off-Policy Evaluation via the Regularized Lagrangian
    Yang, Mengjiao
    Nachum, Ofir
    Dai, Bo
    Li, Lihong
    Schuurmans, Dale
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [13] Consistent On-Line Off-Policy Evaluation
    Hallak, Assaf
    Mannor, Shie
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [14] IntOPE: Off-Policy Evaluation in the Presence of Interference
    Bai, Yuqi
    Zhao, Ziyu
    Zhu, Minqin
    Kuang, Kun
    arXiv, 2024,
  • [15] Offline RL Without Off-Policy Evaluation
    Brandfonbrener, David
    Whitney, William F.
    Ranganath, Rajesh
    Bruna, Joan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [16] Learning Action Embeddings for Off-Policy Evaluation
    Cief, Matej
    Golebiowski, Jacek
    Schmidt, Philipp
    Abedjan, Ziawasch
    Bekasov, Artur
    ADVANCES IN INFORMATION RETRIEVAL, ECIR 2024, PT I, 2024, 14608 : 108 - 122
  • [17] A perspective on off-policy evaluation in reinforcement learning
    Li, Lihong
    FRONTIERS OF COMPUTER SCIENCE, 2019, 13 (05) : 911 - 912
  • [18] A perspective on off-policy evaluation in reinforcement learning
    Lihong Li
    Frontiers of Computer Science, 2019, 13 : 911 - 912
  • [19] Off-Policy Evaluation in Doubly Inhomogeneous Environments
    Bian, Zeyu
    Shi, Chengchun
    Qi, Zhengling
    Wang, Lan
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2024,
  • [20] Distributional Off-Policy Evaluation for Slate Recommendations
    Chaudhari, Shreyas
    Arbour, David
    Theocharous, Georgios
    Vlassis, Nikos
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 8, 2024, : 8265 - 8273