Approximate policy iteration with a policy language bias: Solving relational markov decision processes

被引：0

作者：

Fern, Alan ^{[1
]}

Yoon, Sungwook ^{[2
]}

Givan, Robert ^{[2
]}

机构：

[1] School of Electrical Engineering and Computer Science, Oregon State University, United States

[2] School of Electrical and Computer Engineering, Purdue University, United States

来源：

Journal of Artificial Intelligence Research | 1600年 / 25卷

关键词：

We study an approach to policy selection for large relational Markov Decision Processes (MDPs). We consider a variant of approximate policy iteration (API) that replaces the usual value-function learning step with a learning step in policy space. This is advantageous in domains where good policies are easier to represent and learn than the corresponding value functions; which is often the case for the relational MDPs we are interested in. In order to apply API to such problems; we introduce a relational policy language and corresponding learner. In addition; we introduce a new bootstrapping routine for goal-based planning domains; based on random walks. Such bootstrapping is necessary for many large relational MDPs; where reward is extremely sparse; as API is ineffective in such domains when initialized with an uninformed policy. Our experiments show that the resulting system is able to find good policies for a number of classical planning domains and their stochastic variants by solving them as extremely large relational MDPs. The experiments also point to some limitations of our approach; suggesting future work. © 2006 AI Access Foundation. All rights reserved;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Journal article (JA)

引用

页码：75 / 118

共 50 条

[21] Approximate Newton methods for policy search in markov decision processes
Furmston, Thomas
Lever, Guy
Barber, David
Journal of Machine Learning Research, 2016, 17 : 1 - 51
[22] Approximate Newton Methods for Policy Search in Markov Decision Processes
Furmston, Thomas
Lever, Guy
Barber, David
JOURNAL OF MACHINE LEARNING RESEARCH, 2016, 17
[23] Policy iteration type algorithms for recurrent state Markov decision processes
Patek, SD
COMPUTERS & OPERATIONS RESEARCH, 2004, 31 (14) : 2333 - 2347
[24] COMPUTATIONAL COMPARISON OF POLICY ITERATION ALGORITHMS FOR DISCOUNTED MARKOV DECISION-PROCESSES
HARTLEY, R
LAVERCOMBE, AC
THOMAS, LC
COMPUTERS & OPERATIONS RESEARCH, 1986, 13 (04) : 411 - 420
[25] Partial policy iteration for L1-Robust Markov decision processes
Ho, Chin Pang
Petrik, Marek
Wiesemann, Wolfram
Journal of Machine Learning Research, 2021, 22
[26] COMPUTATIONAL COMPARISON OF POLICY ITERATION ALGORITHMS FOR DISCOUNTED MARKOV DECISION PROCESSES.
Hartley, R.
Lavercombe, A.C.
Thomas, L.C.
1600, (13):
[27] A note on the convergence of policy iteration in Markov decision processes with compact action spaces
Golubin, AY
MATHEMATICS OF OPERATIONS RESEARCH, 2003, 28 (01) : 194 - 200
[28] Inexact GMRES Policy Iteration for Large-Scale Markov Decision Processes
Gargiani, Matilde
Liao-McPherson, Dominic
Zanelli, Andrea
Lygeros, John
IFAC PAPERSONLINE, 2023, 56 (02): : 11249 - 11254
[29] Potential-based online policy iteration algorithms for Markov decision processes
Fang, HT
Cao, XR
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2004, 49 (04) : 493 - 505
[30] Robust topological policy iteration for infinite horizon bounded Markov Decision Processes
Silva Reis, Willy Arthur
de Barros, Leliane Nunes
Delgado, Karina Valdivia
INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2019, 105 : 287 - 304

← 1 2 3 4 5 →