High Confidence Off-Policy Evaluation

被引:0
|
作者
Thomas, Philip S. [1 ,2 ]
Theocharous, Georgios [1 ]
Ghavamzadeh, Mohammad [1 ,3 ]
机构
[1] Adobe Res, San Jose, CA 95110 USA
[2] Univ Massachusetts, Amherst, MA 01003 USA
[3] INRIA Lille, Lille, France
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many reinforcement learning algorithms use trajectories collected from the execution of one or more policies to propose a new policy. Because execution of a bad policy can be costly or dangerous, techniques for evaluating the performance of the new policy without requiring its execution have been of recent interest in industry. Such off-policy evaluation methods, which estimate the performance of a policy using trajectories collected from the execution of other policies, heretofore have not provided confidences regarding the accuracy of their estimates. In this paper we propose an off-policy method for computing a lower confidence bound on the expected return of a policy.
引用
收藏
页码:3000 / 3006
页数:7
相关论文
共 50 条
  • [31] Identification of Subgroups With Similar Benefits in Off-Policy Policy Evaluation
    Keramati, Ramtin
    Gottesman, Omer
    Celi, Leo Anthony
    Doshi-Velez, Finale
    Brunskill, Emma
    CONFERENCE ON HEALTH, INFERENCE, AND LEARNING, VOL 174, 2022, 174 : 397 - 410
  • [32] Minimax Value Interval for Off-Policy Evaluation and Policy Optimization
    Jiang, Nan
    Huang, Jiawei
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [33] Optimal and Adaptive Off-policy Evaluation in Contextual Bandits
    Wang, Yu-Xiang
    Agarwal, Alekh
    Dudik, Miroslav
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [34] Conformal Off-Policy Evaluation in Markov Decision Processes
    Foffano, Daniele
    Russo, Alessio
    Proutiere, Alexandre
    2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 3087 - 3094
  • [35] Balanced Off-Policy Evaluation in General Action Spaces
    Sondhi, Arjun
    Arbour, David
    Dimmery, Drew
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108
  • [36] More Robust Doubly Robust Off-policy Evaluation
    Farajtabar, Mehrdad
    Chow, Yinlam
    Ghavamzadeh, Mohammad
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [37] Combining Parametric and Nonparametric Models for Off-Policy Evaluation
    Gottesman, Omer
    Liu, Yao
    Sussex, Scott
    Brunskill, Emma
    Doshi-Velez, Finale
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [38] Off-policy Policy Evaluation For Sequential Decisions Under Unobserved Confounding
    Namkoong, Hongseok
    Keramati, Ramtin
    Yadlowsky, Steve
    Brunskill, Emma
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [39] Accountable Off-Policy Evaluation With Kernel Bellman Statistics
    Feng, Yihao
    Ren, Tongzheng
    Tang, Ziyang
    Liu, Qiang
    25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019), 2019,
  • [40] Research on Off-Policy Evaluation in Reinforcement Learning: A Survey
    Wang S.-R.
    Niu W.-J.
    Tong E.-D.
    Chen T.
    Li H.
    Tian Y.-Z.
    Liu J.-Q.
    Han Z.
    Li Y.-D.
    Jisuanji Xuebao/Chinese Journal of Computers, 2022, 45 (09): : 1926 - 1945