Doubly Robust Off-policy Value Evaluation for Reinforcement Learning

被引：0

作者：

Jiang, Nan ^{[1
]}

Li, Lihong ^{[2
]}

机构：

[1] Univ Michigan, Comp Sci & Engn, Ann Arbor, MI 48109 USA

[2] Microsoft Res, Beijing, Peoples R China

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48 | 2016年 / 48卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We study the problem of off-policy value evaluation in reinforcement learning (RL), where one aims to estimate the value of a new policy based on data collected by a different policy. This problem is often a critical step when applying RL to real-world problems. Despite its importance, existing general methods either have uncontrolled bias or suffer high variance. In this work, we extend the doubly robust estimator for bandits to sequential decision-making problems, which gets the best of both worlds: it is guaranteed to be unbiased and can have a much lower variance than the popular importance sampling estimators. We demonstrate the estimator's accuracy in several benchmark problems, and illustrate its use as a subroutine in safe policy improvement. We also provide theoretical results on the inherent hardness of the problem, and show that our estimator can match the lower bound in certain scenarios.

引用

页数：10

共 50 条

[1] Doubly Robust Distributionally Robust Off-Policy Evaluation and Learning
Kallus, Nathan
Mao, Xiaojie
Wang, Kaiwen
Zhou, Zhengyuan
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022, : 10598 - 10632
[2] More Robust Doubly Robust Off-policy Evaluation
Farajtabar, Mehrdad
Chow, Yinlam
Ghavamzadeh, Mohammad
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
[3] Double Reinforcement Learning for Efficient and Robust Off-Policy Evaluation
Kallus, Nathan
Uehara, Masatoshi
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
[4] Continuous Value Assignment: A Doubly Robust Data Augmentation for Off-Policy Learning
Lin, Junfan
Huang, Zhongzhan
Wang, Keze
Liu, Lingbo
Lin, Liang
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
[5] A perspective on off-policy evaluation in reinforcement learning
Li, Lihong
[J]. FRONTIERS OF COMPUTER SCIENCE, 2019, 13 (05) : 911 - 912
[6] A perspective on off-policy evaluation in reinforcement learning
Lihong Li
[J]. Frontiers of Computer Science, 2019, 13 : 911 - 912
[7] Reliable Off-Policy Evaluation for Reinforcement Learning
Wang, Jie
Gao, Rui
Zha, Hongyuan
[J]. OPERATIONS RESEARCH, 2024, 72 (02) : 699 - 716
[8] Research on Off-Policy Evaluation in Reinforcement Learning: A Survey
Wang, Shuo-Ru
Niu, Wen-Jia
Tong, En-Dong
Chen, Tong
Li, He
Tian, Yun-Zhe
Liu, Ji-Qiang
Han, Zhen
Li, Yi-Dong
[J]. Jisuanji Xuebao/Chinese Journal of Computers, 2022, 45 (09): : 1926 - 1945
[9] Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning
Thomas, Philip S.
Brunskill, Emma
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
[10] Asymptotically Efficient Off-Policy Evaluation for Tabular Reinforcement Learning
Yin, Ming
Wang, Yu-Xiang
[J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108

← 1 2 3 4 5 →