Data-Efficient Policy Evaluation Through Behavior Policy Search

被引:0
|
作者
Hanna, Josiah P. [1 ]
Thomas, Philip S. [2 ,3 ]
Stone, Peter [1 ]
Niekum, Scott [1 ]
机构
[1] Univ Texas Austin, Austin, TX 78712 USA
[2] Univ Massachusetts, Amherst, MA 01003 USA
[3] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider the task of evaluating a policy for a Markov decision process (MDP). The standard unbiased technique for evaluating a policy is to deploy the policy and observe its performance. We show that the data collected from deploying a different policy, commonly called the behavior policy, can be used to produce unbiased estimates with lower mean squared error than this standard technique. We derive an analytic expression for the optimal behavior policy-the behavior policy that minimizes the mean squared error of the resulting estimates. Because this expression depends on terms that are unknown in practice, we propose a novel policy evaluation sub-problem, behavior policy search: searching for a behavior policy that reduces mean squared error. We present a behavior policy search algorithm and empirically demonstrate its effectiveness in lowering the mean squared error of policy performance estimates.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning
    Thomas, Philip S.
    Brunskill, Emma
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [2] Black-Box Data-efficient Policy Search for Robotics
    Chatzilygeroudis, Konstantinos
    Rama, Roberto
    Kaushik, Rituraj
    Goepp, Dorian
    Vassiliades, Vassilis
    Mouret, Jean-Baptiste
    [J]. 2017 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2017, : 51 - 58
  • [3] Robust On-Policy Sampling for Data-Efficient Policy Evaluation in Reinforcement Learning
    Zhong, Rujie
    Zhang, Duohan
    Schafer, Lukas
    Albrecht, Stefano V.
    Hanna, Josiah P.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [4] Data-Efficient Control Policy Search using Residual Dynamics Learning
    Saveriano, Matteo
    Yin, Yuchao
    Falco, Pietro
    Lee, Dongheui
    [J]. 2017 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2017, : 4709 - 4715
  • [5] Bayesian Optimization with Automatic Prior Selection for Data-Efficient Direct Policy Search
    Pautrat, Remi
    Chatzilygeroudis, Konstantinos
    Mouret, Jean-Baptiste
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2018, : 7571 - 7578
  • [6] Fast Model Identification via Physics Engines for Data-Efficient Policy Search
    Zhu, Shaojun
    Kimmel, Andrew
    Bekris, Kostas E.
    Boularias, Abdeslam
    [J]. PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 3249 - 3256
  • [7] Model-based contextual policy search for data-efficient generalization of robot skills
    Kupcsik, Andras
    Deisenroth, Marc Peter
    Peters, Jan
    Poh, Loh Ai
    Vadakkepat, Prahlad
    Neumann, Gerhard
    [J]. ARTIFICIAL INTELLIGENCE, 2017, 247 : 415 - 439
  • [8] SafePILCO: A Software Tool for Safe and Data-Efficient Policy Synthesis
    Polymenakos, Kyriakos
    Rontsis, Nikitas
    Abate, Alessandro
    Roberts, Stephen
    [J]. QUANTITATIVE EVALUATION OF SYSTEMS (QEST 2020), 2020, 12289 : 18 - 26
  • [9] Data-efficient Hindsight Off-policy Option Learning
    Wulfmeier, Markus
    Rao, Dushyant
    Hafner, Roland
    Lampe, Thomas
    Abdolmaleki, Abbas
    Hertweck, Tim
    Neunert, Michael
    Tirumala, Dhruva
    Siegel, Noah
    Heess, Nicolas
    Riedmiller, Martin
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [10] Sparse Gaussian Processes-based Black-Box Data-efficient Policy Search for Robotics
    Rong, Chunyan
    Huang, Jingyi
    Rosendo, Andre
    [J]. 2021 20TH INTERNATIONAL CONFERENCE ON ADVANCED ROBOTICS (ICAR), 2021, : 468 - 473