High Confidence Off-Policy Evaluation

被引：0

作者：

Thomas, Philip S. ^{[1
,2
]}

Theocharous, Georgios ^{[1
]}

Ghavamzadeh, Mohammad ^{[1
,3
]}

机构：

[1] Adobe Res, San Jose, CA 95110 USA

[2] Univ Massachusetts, Amherst, MA 01003 USA

[3] INRIA Lille, Lille, France

来源：

PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE | 2015年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Many reinforcement learning algorithms use trajectories collected from the execution of one or more policies to propose a new policy. Because execution of a bad policy can be costly or dangerous, techniques for evaluating the performance of the new policy without requiring its execution have been of recent interest in industry. Such off-policy evaluation methods, which estimate the performance of a policy using trajectories collected from the execution of other policies, heretofore have not provided confidences regarding the accuracy of their estimates. In this paper we propose an off-policy method for computing a lower confidence bound on the expected return of a policy.

引用

页码：3000 / 3006

页数：7

共 50 条

[11] Representation Balancing MDPs for Off-Policy Policy Evaluation
Liu, Yao
Gottesman, Omer
Raghu, Aniruddh
Komorowski, Matthieu
Faisal, Aldo
Doshi-Velez, Finale
Brunskill, Emma
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[12] Off-Policy Evaluation via the Regularized Lagrangian
Yang, Mengjiao
Nachum, Ofir
Dai, Bo
Li, Lihong
Schuurmans, Dale
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[13] Consistent On-Line Off-Policy Evaluation
Hallak, Assaf
Mannor, Shie
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
[14] IntOPE: Off-Policy Evaluation in the Presence of Interference
Bai, Yuqi
Zhao, Ziyu
Zhu, Minqin
Kuang, Kun
arXiv, 2024,
[15] Offline RL Without Off-Policy Evaluation
Brandfonbrener, David
Whitney, William F.
Ranganath, Rajesh
Bruna, Joan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[16] Learning Action Embeddings for Off-Policy Evaluation
Cief, Matej
Golebiowski, Jacek
Schmidt, Philipp
Abedjan, Ziawasch
Bekasov, Artur
ADVANCES IN INFORMATION RETRIEVAL, ECIR 2024, PT I, 2024, 14608 : 108 - 122
[17] A perspective on off-policy evaluation in reinforcement learning
Li, Lihong
FRONTIERS OF COMPUTER SCIENCE, 2019, 13 (05) : 911 - 912
[18] A perspective on off-policy evaluation in reinforcement learning
Lihong Li
Frontiers of Computer Science, 2019, 13 : 911 - 912
[19] Off-Policy Evaluation in Doubly Inhomogeneous Environments
Bian, Zeyu
Shi, Chengchun
Qi, Zhengling
Wang, Lan
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2024,
[20] Distributional Off-Policy Evaluation for Slate Recommendations
Chaudhari, Shreyas
Arbour, David
Theocharous, Georgios
Vlassis, Nikos
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 8, 2024, : 8265 - 8273

← 1 2 3 4 5 →