High Confidence Off-Policy Evaluation

被引：0

作者：

Thomas, Philip S. ^{[1
,2
]}

Theocharous, Georgios ^{[1
]}

Ghavamzadeh, Mohammad ^{[1
,3
]}

机构：

[1] Adobe Res, San Jose, CA 95110 USA

[2] Univ Massachusetts, Amherst, MA 01003 USA

[3] INRIA Lille, Lille, France

来源：

PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE | 2015年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Many reinforcement learning algorithms use trajectories collected from the execution of one or more policies to propose a new policy. Because execution of a bad policy can be costly or dangerous, techniques for evaluating the performance of the new policy without requiring its execution have been of recent interest in industry. Such off-policy evaluation methods, which estimate the performance of a policy using trajectories collected from the execution of other policies, heretofore have not provided confidences regarding the accuracy of their estimates. In this paper we propose an off-policy method for computing a lower confidence bound on the expected return of a policy.

引用

页码：3000 / 3006

页数：7

共 50 条

[31] Identification of Subgroups With Similar Benefits in Off-Policy Policy Evaluation
Keramati, Ramtin
Gottesman, Omer
Celi, Leo Anthony
Doshi-Velez, Finale
Brunskill, Emma
CONFERENCE ON HEALTH, INFERENCE, AND LEARNING, VOL 174, 2022, 174 : 397 - 410
[32] Minimax Value Interval for Off-Policy Evaluation and Policy Optimization
Jiang, Nan
Huang, Jiawei
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[33] Optimal and Adaptive Off-policy Evaluation in Contextual Bandits
Wang, Yu-Xiang
Agarwal, Alekh
Dudik, Miroslav
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
[34] Conformal Off-Policy Evaluation in Markov Decision Processes
Foffano, Daniele
Russo, Alessio
Proutiere, Alexandre
2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 3087 - 3094
[35] Balanced Off-Policy Evaluation in General Action Spaces
Sondhi, Arjun
Arbour, David
Dimmery, Drew
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108
[36] More Robust Doubly Robust Off-policy Evaluation
Farajtabar, Mehrdad
Chow, Yinlam
Ghavamzadeh, Mohammad
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
[37] Combining Parametric and Nonparametric Models for Off-Policy Evaluation
Gottesman, Omer
Liu, Yao
Sussex, Scott
Brunskill, Emma
Doshi-Velez, Finale
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
[38] Off-policy Policy Evaluation For Sequential Decisions Under Unobserved Confounding
Namkoong, Hongseok
Keramati, Ramtin
Yadlowsky, Steve
Brunskill, Emma
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[39] Accountable Off-Policy Evaluation With Kernel Bellman Statistics
Feng, Yihao
Ren, Tongzheng
Tang, Ziyang
Liu, Qiang
25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019), 2019,
[40] Research on Off-Policy Evaluation in Reinforcement Learning: A Survey
Wang S.-R.
Niu W.-J.
Tong E.-D.
Chen T.
Li H.
Tian Y.-Z.
Liu J.-Q.
Han Z.
Li Y.-D.
Jisuanji Xuebao/Chinese Journal of Computers, 2022, 45 (09): : 1926 - 1945

← 1 2 3 4 5 →