High Confidence Off-Policy Evaluation

被引:0
|
作者
Thomas, Philip S. [1 ,2 ]
Theocharous, Georgios [1 ]
Ghavamzadeh, Mohammad [1 ,3 ]
机构
[1] Adobe Res, San Jose, CA 95110 USA
[2] Univ Massachusetts, Amherst, MA 01003 USA
[3] INRIA Lille, Lille, France
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many reinforcement learning algorithms use trajectories collected from the execution of one or more policies to propose a new policy. Because execution of a bad policy can be costly or dangerous, techniques for evaluating the performance of the new policy without requiring its execution have been of recent interest in industry. Such off-policy evaluation methods, which estimate the performance of a policy using trajectories collected from the execution of other policies, heretofore have not provided confidences regarding the accuracy of their estimates. In this paper we propose an off-policy method for computing a lower confidence bound on the expected return of a policy.
引用
收藏
页码:3000 / 3006
页数:7
相关论文
共 50 条
  • [41] Accountable Off-Policy Evaluation With Kernel Bellman Statistics
    Feng, Yihao
    Ren, Tongzheng
    Tang, Ziyang
    Liu, Qiang
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [42] Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning
    Thomas, Philip S.
    Brunskill, Emma
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [43] Off-Policy Proximal Policy Optimization
    Meng, Wenjia
    Zheng, Qian
    Pan, Gang
    Yin, Yilong
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 8, 2023, : 9162 - 9170
  • [44] Average-Reward Off-Policy Policy Evaluation with Function Approximation
    Zhang, Shangtong
    Wan, Yi
    Sutton, Richard S.
    Whiteson, Shimon
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [45] A Nonparametric Off-Policy Policy Gradient
    Tosatto, Samuele
    Carvalho, Joao
    Abdulsamad, Hany
    Peters, Jan
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108
  • [46] Boosted Off-Policy Learning
    London, Ben
    Lu, Levi
    Sandler, Ted
    Joachims, Thorsten
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206
  • [47] Supervised Off-Policy Ranking
    Jin, Yue
    Zhang, Yue
    Qin, Tao
    Zhang, Xudong
    Yuan, Jian
    Li, Houqiang
    Liu, Tie-Yan
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022, : 10323 - 10339
  • [48] Q(λ) with Off-Policy Corrections
    Harutyunyan, Anna
    Bellemare, Marc G.
    Stepleton, Tom
    Munos, Remi
    ALGORITHMIC LEARNING THEORY, (ALT 2016), 2016, 9925 : 305 - 320
  • [49] On the Relation between Policy Improvement and Off-Policy Minimum-Variance Policy Evaluation
    Metelli, Alberto Maria
    Meta, Samuele
    Restelli, Marcello
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2023, 216 : 1423 - 1433
  • [50] Asymptotically Efficient Off-Policy Evaluation for Tabular Reinforcement Learning
    Yin, Ming
    Wang, Yu-Xiang
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108