Doubly Robust Off-policy Value Evaluation for Reinforcement Learning

被引:0
|
作者
Jiang, Nan [1 ]
Li, Lihong [2 ]
机构
[1] Univ Michigan, Comp Sci & Engn, Ann Arbor, MI 48109 USA
[2] Microsoft Res, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study the problem of off-policy value evaluation in reinforcement learning (RL), where one aims to estimate the value of a new policy based on data collected by a different policy. This problem is often a critical step when applying RL to real-world problems. Despite its importance, existing general methods either have uncontrolled bias or suffer high variance. In this work, we extend the doubly robust estimator for bandits to sequential decision-making problems, which gets the best of both worlds: it is guaranteed to be unbiased and can have a much lower variance than the popular importance sampling estimators. We demonstrate the estimator's accuracy in several benchmark problems, and illustrate its use as a subroutine in safe policy improvement. We also provide theoretical results on the inherent hardness of the problem, and show that our estimator can match the lower bound in certain scenarios.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Doubly Robust Distributionally Robust Off-Policy Evaluation and Learning
    Kallus, Nathan
    Mao, Xiaojie
    Wang, Kaiwen
    Zhou, Zhengyuan
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022, : 10598 - 10632
  • [2] More Robust Doubly Robust Off-policy Evaluation
    Farajtabar, Mehrdad
    Chow, Yinlam
    Ghavamzadeh, Mohammad
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [3] Double Reinforcement Learning for Efficient and Robust Off-Policy Evaluation
    Kallus, Nathan
    Uehara, Masatoshi
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [4] Continuous Value Assignment: A Doubly Robust Data Augmentation for Off-Policy Learning
    Lin, Junfan
    Huang, Zhongzhan
    Wang, Keze
    Liu, Lingbo
    Lin, Liang
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
  • [5] A perspective on off-policy evaluation in reinforcement learning
    Li, Lihong
    [J]. FRONTIERS OF COMPUTER SCIENCE, 2019, 13 (05) : 911 - 912
  • [6] A perspective on off-policy evaluation in reinforcement learning
    Lihong Li
    [J]. Frontiers of Computer Science, 2019, 13 : 911 - 912
  • [7] Reliable Off-Policy Evaluation for Reinforcement Learning
    Wang, Jie
    Gao, Rui
    Zha, Hongyuan
    [J]. OPERATIONS RESEARCH, 2024, 72 (02) : 699 - 716
  • [8] Research on Off-Policy Evaluation in Reinforcement Learning: A Survey
    Wang, Shuo-Ru
    Niu, Wen-Jia
    Tong, En-Dong
    Chen, Tong
    Li, He
    Tian, Yun-Zhe
    Liu, Ji-Qiang
    Han, Zhen
    Li, Yi-Dong
    [J]. Jisuanji Xuebao/Chinese Journal of Computers, 2022, 45 (09): : 1926 - 1945
  • [9] Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning
    Thomas, Philip S.
    Brunskill, Emma
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [10] Asymptotically Efficient Off-Policy Evaluation for Tabular Reinforcement Learning
    Yin, Ming
    Wang, Yu-Xiang
    [J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108