High Confidence Off-Policy Evaluation

被引:0
|
作者
Thomas, Philip S. [1 ,2 ]
Theocharous, Georgios [1 ]
Ghavamzadeh, Mohammad [1 ,3 ]
机构
[1] Adobe Res, San Jose, CA 95110 USA
[2] Univ Massachusetts, Amherst, MA 01003 USA
[3] INRIA Lille, Lille, France
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many reinforcement learning algorithms use trajectories collected from the execution of one or more policies to propose a new policy. Because execution of a bad policy can be costly or dangerous, techniques for evaluating the performance of the new policy without requiring its execution have been of recent interest in industry. Such off-policy evaluation methods, which estimate the performance of a policy using trajectories collected from the execution of other policies, heretofore have not provided confidences regarding the accuracy of their estimates. In this paper we propose an off-policy method for computing a lower confidence bound on the expected return of a policy.
引用
收藏
页码:3000 / 3006
页数:7
相关论文
共 50 条
  • [21] Control Variates for Slate Off-Policy Evaluation
    Vlassis, Nikos
    Chandrashekar, Ashok
    Gil, Fernando Amat
    Kallus, Nathan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [22] Reliable Off-Policy Evaluation for Reinforcement Learning
    Wang, Jie
    Gao, Rui
    Zha, Hongyuan
    OPERATIONS RESEARCH, 2024, 72 (02) : 699 - 716
  • [23] Handling Confounding for Realistic Off-Policy Evaluation
    Sohoney, Saurabh
    Prabhu, Nikita
    Chaoji, Vineet
    COMPANION PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE 2018 (WWW 2018), 2018, : 33 - 34
  • [24] Debiased Off-Policy Evaluation for Recommendation Systems
    Narita, Yusuke
    Yasui, Shota
    Yata, Kohei
    15TH ACM CONFERENCE ON RECOMMENDER SYSTEMS (RECSYS 2021), 2021, : 372 - 379
  • [25] Off-Policy Evaluation in Partially Observable Environments
    Tennenholtz, Guy
    Mannor, Shie
    Shalit, Uri
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 10276 - 10283
  • [26] On the Design of Estimators for Bandit Off-Policy Evaluation
    Vlassis, Nikos
    Bibaut, Aurelien
    Dimakopoulou, Maria
    Jebara, Tony
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [27] Data Poisoning Attacks on Off-Policy Policy Evaluation Methods
    Lobo, Elita
    Singh, Harvineet
    Petrik, Marek
    Rudin, Cynthia
    Lakkaraju, Himabindu
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, VOL 180, 2022, 180 : 1264 - 1274
  • [28] Off-Policy Evaluation with Policy-Dependent Optimization Response
    Guo, Wenshuo
    Jordan, Michael I.
    Zhou, Angela
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [29] Off-Policy Confidence Interval Estimation with Confounded Markov Decision Process
    Shi, Chengchun
    Zhu, Jin
    Ye, Shen
    Luo, Shikai
    Zhu, Hongtu
    Song, Rui
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2024, 119 (545) : 273 - 284
  • [30] Policy-Adaptive Estimator Selection for Off-Policy Evaluation
    Udagawa, Takuma
    Kiyohara, Haruka
    Narita, Yusuke
    Saito, Yuta
    Tateno, Kei
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 8, 2023, : 10025 - 10033