Approximate policy iteration with a policy language bias: Solving relational markov decision processes

被引：0

作者：

Fern, Alan ^{[1
]}

Yoon, Sungwook ^{[2
]}

Givan, Robert ^{[2
]}

机构：

[1] School of Electrical Engineering and Computer Science, Oregon State University, United States

[2] School of Electrical and Computer Engineering, Purdue University, United States

来源：

Journal of Artificial Intelligence Research | 1600年 / 25卷

关键词：

We study an approach to policy selection for large relational Markov Decision Processes (MDPs). We consider a variant of approximate policy iteration (API) that replaces the usual value-function learning step with a learning step in policy space. This is advantageous in domains where good policies are easier to represent and learn than the corresponding value functions; which is often the case for the relational MDPs we are interested in. In order to apply API to such problems; we introduce a relational policy language and corresponding learner. In addition; we introduce a new bootstrapping routine for goal-based planning domains; based on random walks. Such bootstrapping is necessary for many large relational MDPs; where reward is extremely sparse; as API is ineffective in such domains when initialized with an uninformed policy. Our experiments show that the resulting system is able to find good policies for a number of classical planning domains and their stochastic variants by solving them as extremely large relational MDPs. The experiments also point to some limitations of our approach; suggesting future work. © 2006 AI Access Foundation. All rights reserved;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Journal article (JA)

引用

页码：75 / 118

共 50 条

[31] An evolutionary random policy search algorithm for solving Markov decision processes
Hu, Jiaqiao
Fu, Michael C.
Ramezani, Vahid R.
Marcus, Steven I.
INFORMS JOURNAL ON COMPUTING, 2007, 19 (02) : 161 - 174
[32] Approximate Policy Iteration for Semi-Markov Control Revisited
Gosavi, Abhijit
COMPLEX ADAPTIVE SYSTEMS, 2011, 6
[33] Verification of General Markov Decision Processes by Approximate Similarity Relations and Policy Refinement
Haesaert, Sofie
Abate, Alessandro
Van den Hof, Paul M. J.
QUANTITATIVE EVALUATION OF SYSTEMS, QEST 2016, 2016, 9826 : 227 - 243
[34] VERIFICATION OF GENERAL MARKOV DECISION PROCESSES BY APPROXIMATE SIMILARITY RELATIONS AND POLICY REFINEMENT
Haesaert, Sofie
Soudjani, Sadegh Esmaeil Zadeh
Abate, Alessandro
SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2017, 55 (04) : 2333 - 2367
[35] Temporal logic control of general Markov decision processes by approximate policy refinement
Haesaert, Sofie
Soudjani, Sadegh
Abate, Alessandro
IFAC PAPERSONLINE, 2018, 51 (16): : 73 - 78
[36] Approximate robust policy iteration for discounted infinite-horizon Markov decision processes with uncertain stationary parametric tiransition matrices
Li, Baohua
Si, Jennie
2007 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-6, 2007, : 2052 - 2057
[37] A NEW POLICY ITERATION SCHEME FOR MARKOV DECISION-PROCESSES USING SCHWEITZER FORMULA
LASSERRE, JB
JOURNAL OF APPLIED PROBABILITY, 1994, 31 (01) : 268 - 273
[38] The policy iteration algorithm for average reward Markov decision processes with general state space
Meyn, SP
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1997, 42 (12) : 1663 - 1680
[39] Average optimality for continuous-time Markov decision processes with a policy iteration approach
Zhu, Quanxin
JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 2008, 339 (01) : 691 - 704
[40] Solving Common-Payoff Games with Approximate Policy Iteration
Sokota, Samuel
Lockhart, Edward
Timbers, Finbarr
Davoodi, Elnaz
D'Orazio, Ryan
Burch, Neil
Schmid, Martin
Bowling, Michael
Lanctot, Marc
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 9695 - 9703

← 1 2 3 4 5 →