Approximate policy iteration with a policy language bias: Solving relational markov decision processes

被引：0

作者：

Fern, Alan ^{[1
]}

Yoon, Sungwook ^{[2
]}

Givan, Robert ^{[2
]}

机构：

[1] School of Electrical Engineering and Computer Science, Oregon State University, United States

[2] School of Electrical and Computer Engineering, Purdue University, United States

来源：

Journal of Artificial Intelligence Research | 1600年 / 25卷

关键词：

We study an approach to policy selection for large relational Markov Decision Processes (MDPs). We consider a variant of approximate policy iteration (API) that replaces the usual value-function learning step with a learning step in policy space. This is advantageous in domains where good policies are easier to represent and learn than the corresponding value functions; which is often the case for the relational MDPs we are interested in. In order to apply API to such problems; we introduce a relational policy language and corresponding learner. In addition; we introduce a new bootstrapping routine for goal-based planning domains; based on random walks. Such bootstrapping is necessary for many large relational MDPs; where reward is extremely sparse; as API is ineffective in such domains when initialized with an uninformed policy. Our experiments show that the resulting system is able to find good policies for a number of classical planning domains and their stochastic variants by solving them as extremely large relational MDPs. The experiments also point to some limitations of our approach; suggesting future work. © 2006 AI Access Foundation. All rights reserved;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Journal article (JA)

引用

页码：75 / 118

共 50 条

[41] Approximate Value Iteration for Risk-Aware Markov Decision Processes
Yu, Pengqian
Haskell, William B.
Xu, Huan
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2018, 63 (09) : 3135 - 3142
[42] Policy Iteration for Continuous-Time Average Reward Markov Decision Processes in Polish Spaces
Zhu, Quanxin
Yang, Xinsong
Huang, Chuangxia
ABSTRACT AND APPLIED ANALYSIS, 2009,
[43] ON THE CONVERGENCE OF POLICY ITERATION IN FINITE STATE UNDISCOUNTED MARKOV DECISION-PROCESSES - THE UNICHAIN CASE
HORDIJK, A
PUTERMAN, ML
MATHEMATICS OF OPERATIONS RESEARCH, 1987, 12 (01) : 163 - 176
[44] Mean Field Approximation of the Policy Iteration Algorithm for Graph-based Markov Decision Processes
Peyrard, Nathalie
Sabbadin, Regis
ECAI 2006, PROCEEDINGS, 2006, 141 : 595 - +
[45] MODIFIED POLICY ITERATION ALGORITHMS FOR DISCOUNTED MARKOV DECISION PROBLEMS
PUTERMAN, ML
SHIN, MC
MANAGEMENT SCIENCE, 1978, 24 (11) : 1127 - 1137
[46] Adaptive Approximate Policy Iteration
Hao, Botao
Lazic, Nevena
Abbasi-Yadkori, Yasin
Joulani, Pooria
Szepesvari, Csaba
24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130 : 523 - 531
[47] Safe Policy Iteration: A Monotonically Improving Approximate Policy Iteration Approach
Metelli, Alberto Maria
Pirotta, Matteo
Calandriello, Daniele
Restelli, Marcello
JOURNAL OF MACHINE LEARNING RESEARCH, 2021, 22
[48] Navigating to the Best Policy in Markov Decision Processes
Al Marjani, Aymen
Garivier, Aurelien
Proutiere, Alexandre
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[49] Policy gradient in Lipschitz Markov Decision Processes
Pirotta, Matteo
Restelli, Marcello
Bascetta, Luca
MACHINE LEARNING, 2015, 100 (2-3) : 255 - 283
[50] Efficient Policy Representation for Markov Decision Processes
Khademi, Anahita
Khademian, Sepehr
SMART TECHNOLOGIES IN URBAN ENGINEERING, STUE-2022, 2023, 536 : 151 - 162

← 1 2 3 4 5 →