Data-Efficient Policy Evaluation Through Behavior Policy Search

被引：0

作者：

Hanna, Josiah P. ^{[1
]}

Thomas, Philip S. ^{[2
,3
]}

Stone, Peter ^{[1
]}

Niekum, Scott ^{[1
]}

机构：

[1] Univ Texas Austin, Austin, TX 78712 USA

[2] Univ Massachusetts, Amherst, MA 01003 USA

[3] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70 | 2017年 / 70卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We consider the task of evaluating a policy for a Markov decision process (MDP). The standard unbiased technique for evaluating a policy is to deploy the policy and observe its performance. We show that the data collected from deploying a different policy, commonly called the behavior policy, can be used to produce unbiased estimates with lower mean squared error than this standard technique. We derive an analytic expression for the optimal behavior policy-the behavior policy that minimizes the mean squared error of the resulting estimates. Because this expression depends on terms that are unknown in practice, we propose a novel policy evaluation sub-problem, behavior policy search: searching for a behavior policy that reduces mean squared error. We present a behavior policy search algorithm and empirically demonstrate its effectiveness in lowering the mean squared error of policy performance estimates.

引用

页数：10

共 50 条

[1] Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning
Thomas, Philip S.
Brunskill, Emma
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
[2] Black-Box Data-efficient Policy Search for Robotics
Chatzilygeroudis, Konstantinos
Rama, Roberto
Kaushik, Rituraj
Goepp, Dorian
Vassiliades, Vassilis
Mouret, Jean-Baptiste
[J]. 2017 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2017, : 51 - 58
[3] Robust On-Policy Sampling for Data-Efficient Policy Evaluation in Reinforcement Learning
Zhong, Rujie
Zhang, Duohan
Schafer, Lukas
Albrecht, Stefano V.
Hanna, Josiah P.
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[4] Data-Efficient Control Policy Search using Residual Dynamics Learning
Saveriano, Matteo
Yin, Yuchao
Falco, Pietro
Lee, Dongheui
[J]. 2017 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2017, : 4709 - 4715
[5] Bayesian Optimization with Automatic Prior Selection for Data-Efficient Direct Policy Search
Pautrat, Remi
Chatzilygeroudis, Konstantinos
Mouret, Jean-Baptiste
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2018, : 7571 - 7578
[6] Fast Model Identification via Physics Engines for Data-Efficient Policy Search
Zhu, Shaojun
Kimmel, Andrew
Bekris, Kostas E.
Boularias, Abdeslam
[J]. PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 3249 - 3256
[7] Model-based contextual policy search for data-efficient generalization of robot skills
Kupcsik, Andras
Deisenroth, Marc Peter
Peters, Jan
Poh, Loh Ai
Vadakkepat, Prahlad
Neumann, Gerhard
[J]. ARTIFICIAL INTELLIGENCE, 2017, 247 : 415 - 439
[8] SafePILCO: A Software Tool for Safe and Data-Efficient Policy Synthesis
Polymenakos, Kyriakos
Rontsis, Nikitas
Abate, Alessandro
Roberts, Stephen
[J]. QUANTITATIVE EVALUATION OF SYSTEMS (QEST 2020), 2020, 12289 : 18 - 26
[9] Data-efficient Hindsight Off-policy Option Learning
Wulfmeier, Markus
Rao, Dushyant
Hafner, Roland
Lampe, Thomas
Abdolmaleki, Abbas
Hertweck, Tim
Neunert, Michael
Tirumala, Dhruva
Siegel, Noah
Heess, Nicolas
Riedmiller, Martin
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[10] Sparse Gaussian Processes-based Black-Box Data-efficient Policy Search for Robotics
Rong, Chunyan
Huang, Jingyi
Rosendo, Andre
[J]. 2021 20TH INTERNATIONAL CONFERENCE ON ADVANCED ROBOTICS (ICAR), 2021, : 468 - 473

← 1 2 3 4 5 →