High Confidence Off-Policy Evaluation

被引:0
|
作者
Thomas, Philip S. [1 ,2 ]
Theocharous, Georgios [1 ]
Ghavamzadeh, Mohammad [1 ,3 ]
机构
[1] Adobe Res, San Jose, CA 95110 USA
[2] Univ Massachusetts, Amherst, MA 01003 USA
[3] INRIA Lille, Lille, France
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many reinforcement learning algorithms use trajectories collected from the execution of one or more policies to propose a new policy. Because execution of a bad policy can be costly or dangerous, techniques for evaluating the performance of the new policy without requiring its execution have been of recent interest in industry. Such off-policy evaluation methods, which estimate the performance of a policy using trajectories collected from the execution of other policies, heretofore have not provided confidences regarding the accuracy of their estimates. In this paper we propose an off-policy method for computing a lower confidence bound on the expected return of a policy.
引用
收藏
页码:3000 / 3006
页数:7
相关论文
共 50 条
  • [1] Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation
    Hanna, Josiah P.
    Stone, Peter
    Niekum, Scott
    AAMAS'17: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2017, : 538 - 546
  • [2] Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation
    Hanna, Josiah P.
    Stone, Peter
    Niekum, Scott
    THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4933 - 4934
  • [3] Off-Policy Confidence Sequences
    Karampatziakis, Nikos
    Mineiro, Paul
    Ramdas, Aaditya
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [4] Off-Policy Evaluation via Off-Policy Classification
    Irpan, Alex
    Rao, Kanishka
    Bousmalis, Konstantinos
    Harris, Chris
    Ibarz, Julian
    Levine, Sergey
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [5] High-Confidence Off-Policy (or Counterfactual) Variance Estimation
    Chandak, Yash
    Shankar, Shiv
    Thomas, Philip S.
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 6939 - 6947
  • [6] Universal Off-Policy Evaluation
    Chandak, Yash
    Niekum, Scott
    da Silva, Bruno Castro
    Learned-Miller, Erik
    Brunskill, Emma
    Thomas, Philip S.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
  • [7] Off-Policy Evaluation for Human Feedback
    Gao, Qitong
    Gao, Ge
    Dong, Juncheng
    Tarokh, Vahid
    Chi, Min
    Pajic, Miroslav
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [8] Off-policy evaluation for slate recommendation
    Swaminathan, Adith
    Krishnamurthy, Akshay
    Agarwal, Alekh
    Dudik, Miroslav
    Langford, John
    Jose, Damien
    Zitouni, Imed
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [9] Evaluating the Robustness of Off-Policy Evaluation
    Saito, Yuta
    Udagawa, Takuma
    Kiyohara, Haruka
    Mogi, Kazuki
    Narita, Yusuke
    Tateno, Kei
    15TH ACM CONFERENCE ON RECOMMENDER SYSTEMS (RECSYS 2021), 2021, : 114 - 123
  • [10] State Relevance for Off-Policy Evaluation
    Shen, Simon P.
    Ma, Yecheng Jason
    Gottesman, Omer
    Doshi-Velez, Finale
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139