Future-Dependent Value-Based Off-Policy Evaluation in POMDPs

被引:0
|
作者
Uehara, Masatoshi [1 ,2 ]
Kiyohara, Haruka [2 ,7 ]
Bennett, Andrew [2 ,3 ]
Chernozhukov, Victor [4 ]
Jiang, Nan [5 ]
Kallus, Nathan [2 ]
Shi, Chengchun [6 ]
Sun, Wen [2 ]
机构
[1] Genentech Inc, San Francisco, CA 94080 USA
[2] Cornell Univ, Ithaca, NY 14853 USA
[3] Morgan Stanley, New York, NY USA
[4] MIT, Cambridge, MA 02139 USA
[5] UIUC, Champaign, IL USA
[6] LSE, London, England
[7] Tokyo Inst Technol, Tokyo, Japan
基金
英国工程与自然科学研究理事会; 美国国家科学基金会;
关键词
VARIABLES; MODELS; COMPLEXITY;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study off-policy evaluation (OPE) for partially observable MDPs (POMDPs) with general function approximation. Existing methods such as sequential importance sampling estimators suffer from the curse of horizon in POMDPs. To circumvent this problem, we develop a novel model-free OPE method by introducing future-dependent value functions that take future proxies as inputs and perform a similar role to that of classical value functions in fully-observable MDPs. We derive a new off-policy Bellman equation for future-dependent value functions as conditional moment equations that use history proxies as instrumental variables. We further propose a minimax learning method to learn future-dependent value functions using the new Bellman equation. We obtain the PAC result, which implies our OPE estimator is close to the true policy value under Bellman completeness, as long as futures and histories contain sufficient information about latent states. Our code is available at https://github.com/aiueola/neurips2023-future-dependent-ope.
引用
收藏
页数:18
相关论文
共 50 条
  • [41] Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation
    Hanna, Josiah P.
    Stone, Peter
    Niekum, Scott
    AAMAS'17: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2017, : 538 - 546
  • [42] Balanced Off-Policy Evaluation in General Action Spaces
    Sondhi, Arjun
    Arbour, David
    Dimmery, Drew
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108
  • [43] Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation
    Hanna, Josiah P.
    Stone, Peter
    Niekum, Scott
    THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4933 - 4934
  • [44] More Robust Doubly Robust Off-policy Evaluation
    Farajtabar, Mehrdad
    Chow, Yinlam
    Ghavamzadeh, Mohammad
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [45] Off-policy Policy Evaluation For Sequential Decisions Under Unobserved Confounding
    Namkoong, Hongseok
    Keramati, Ramtin
    Yadlowsky, Steve
    Brunskill, Emma
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [46] Combining Parametric and Nonparametric Models for Off-Policy Evaluation
    Gottesman, Omer
    Liu, Yao
    Sussex, Scott
    Brunskill, Emma
    Doshi-Velez, Finale
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [47] Accountable Off-Policy Evaluation With Kernel Bellman Statistics
    Feng, Yihao
    Ren, Tongzheng
    Tang, Ziyang
    Liu, Qiang
    25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019), 2019,
  • [48] Research on Off-Policy Evaluation in Reinforcement Learning: A Survey
    Wang S.-R.
    Niu W.-J.
    Tong E.-D.
    Chen T.
    Li H.
    Tian Y.-Z.
    Liu J.-Q.
    Han Z.
    Li Y.-D.
    Jisuanji Xuebao/Chinese Journal of Computers, 2022, 45 (09): : 1926 - 1945
  • [49] State-Action Similarity-Based Representations for Off-Policy Evaluation
    Pavse, Brahma S.
    Hanna, Josiah P.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [50] Accountable Off-Policy Evaluation With Kernel Bellman Statistics
    Feng, Yihao
    Ren, Tongzheng
    Tang, Ziyang
    Liu, Qiang
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119