Approximate policy iteration with a policy language bias: Solving relational markov decision processes

被引:0
|
作者
Fern, Alan [1 ]
Yoon, Sungwook [2 ]
Givan, Robert [2 ]
机构
[1] School of Electrical Engineering and Computer Science, Oregon State University, United States
[2] School of Electrical and Computer Engineering, Purdue University, United States
关键词
We study an approach to policy selection for large relational Markov Decision Processes (MDPs). We consider a variant of approximate policy iteration (API) that replaces the usual value-function learning step with a learning step in policy space. This is advantageous in domains where good policies are easier to represent and learn than the corresponding value functions; which is often the case for the relational MDPs we are interested in. In order to apply API to such problems; we introduce a relational policy language and corresponding learner. In addition; we introduce a new bootstrapping routine for goal-based planning domains; based on random walks. Such bootstrapping is necessary for many large relational MDPs; where reward is extremely sparse; as API is ineffective in such domains when initialized with an uninformed policy. Our experiments show that the resulting system is able to find good policies for a number of classical planning domains and their stochastic variants by solving them as extremely large relational MDPs. The experiments also point to some limitations of our approach; suggesting future work. © 2006 AI Access Foundation. All rights reserved;
D O I
暂无
中图分类号
学科分类号
摘要
Journal article (JA)
引用
收藏
页码:75 / 118
相关论文
共 50 条
  • [21] Approximate Newton methods for policy search in markov decision processes
    Furmston, Thomas
    Lever, Guy
    Barber, David
    Journal of Machine Learning Research, 2016, 17 : 1 - 51
  • [22] Approximate Newton Methods for Policy Search in Markov Decision Processes
    Furmston, Thomas
    Lever, Guy
    Barber, David
    JOURNAL OF MACHINE LEARNING RESEARCH, 2016, 17
  • [23] Policy iteration type algorithms for recurrent state Markov decision processes
    Patek, SD
    COMPUTERS & OPERATIONS RESEARCH, 2004, 31 (14) : 2333 - 2347
  • [24] COMPUTATIONAL COMPARISON OF POLICY ITERATION ALGORITHMS FOR DISCOUNTED MARKOV DECISION-PROCESSES
    HARTLEY, R
    LAVERCOMBE, AC
    THOMAS, LC
    COMPUTERS & OPERATIONS RESEARCH, 1986, 13 (04) : 411 - 420
  • [25] Partial policy iteration for L1-Robust Markov decision processes
    Ho, Chin Pang
    Petrik, Marek
    Wiesemann, Wolfram
    Journal of Machine Learning Research, 2021, 22
  • [26] COMPUTATIONAL COMPARISON OF POLICY ITERATION ALGORITHMS FOR DISCOUNTED MARKOV DECISION PROCESSES.
    Hartley, R.
    Lavercombe, A.C.
    Thomas, L.C.
    1600, (13):
  • [27] A note on the convergence of policy iteration in Markov decision processes with compact action spaces
    Golubin, AY
    MATHEMATICS OF OPERATIONS RESEARCH, 2003, 28 (01) : 194 - 200
  • [28] Inexact GMRES Policy Iteration for Large-Scale Markov Decision Processes
    Gargiani, Matilde
    Liao-McPherson, Dominic
    Zanelli, Andrea
    Lygeros, John
    IFAC PAPERSONLINE, 2023, 56 (02): : 11249 - 11254
  • [29] Potential-based online policy iteration algorithms for Markov decision processes
    Fang, HT
    Cao, XR
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2004, 49 (04) : 493 - 505
  • [30] Robust topological policy iteration for infinite horizon bounded Markov Decision Processes
    Silva Reis, Willy Arthur
    de Barros, Leliane Nunes
    Delgado, Karina Valdivia
    INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2019, 105 : 287 - 304