Approximate policy iteration with a policy language bias: Solving relational markov decision processes

被引:0
|
作者
Fern, Alan [1 ]
Yoon, Sungwook [2 ]
Givan, Robert [2 ]
机构
[1] School of Electrical Engineering and Computer Science, Oregon State University, United States
[2] School of Electrical and Computer Engineering, Purdue University, United States
关键词
We study an approach to policy selection for large relational Markov Decision Processes (MDPs). We consider a variant of approximate policy iteration (API) that replaces the usual value-function learning step with a learning step in policy space. This is advantageous in domains where good policies are easier to represent and learn than the corresponding value functions; which is often the case for the relational MDPs we are interested in. In order to apply API to such problems; we introduce a relational policy language and corresponding learner. In addition; we introduce a new bootstrapping routine for goal-based planning domains; based on random walks. Such bootstrapping is necessary for many large relational MDPs; where reward is extremely sparse; as API is ineffective in such domains when initialized with an uninformed policy. Our experiments show that the resulting system is able to find good policies for a number of classical planning domains and their stochastic variants by solving them as extremely large relational MDPs. The experiments also point to some limitations of our approach; suggesting future work. © 2006 AI Access Foundation. All rights reserved;
D O I
暂无
中图分类号
学科分类号
摘要
Journal article (JA)
引用
收藏
页码:75 / 118
相关论文
共 50 条
  • [41] Approximate Value Iteration for Risk-Aware Markov Decision Processes
    Yu, Pengqian
    Haskell, William B.
    Xu, Huan
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2018, 63 (09) : 3135 - 3142
  • [42] Policy Iteration for Continuous-Time Average Reward Markov Decision Processes in Polish Spaces
    Zhu, Quanxin
    Yang, Xinsong
    Huang, Chuangxia
    ABSTRACT AND APPLIED ANALYSIS, 2009,
  • [43] ON THE CONVERGENCE OF POLICY ITERATION IN FINITE STATE UNDISCOUNTED MARKOV DECISION-PROCESSES - THE UNICHAIN CASE
    HORDIJK, A
    PUTERMAN, ML
    MATHEMATICS OF OPERATIONS RESEARCH, 1987, 12 (01) : 163 - 176
  • [44] Mean Field Approximation of the Policy Iteration Algorithm for Graph-based Markov Decision Processes
    Peyrard, Nathalie
    Sabbadin, Regis
    ECAI 2006, PROCEEDINGS, 2006, 141 : 595 - +
  • [45] MODIFIED POLICY ITERATION ALGORITHMS FOR DISCOUNTED MARKOV DECISION PROBLEMS
    PUTERMAN, ML
    SHIN, MC
    MANAGEMENT SCIENCE, 1978, 24 (11) : 1127 - 1137
  • [46] Adaptive Approximate Policy Iteration
    Hao, Botao
    Lazic, Nevena
    Abbasi-Yadkori, Yasin
    Joulani, Pooria
    Szepesvari, Csaba
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130 : 523 - 531
  • [47] Safe Policy Iteration: A Monotonically Improving Approximate Policy Iteration Approach
    Metelli, Alberto Maria
    Pirotta, Matteo
    Calandriello, Daniele
    Restelli, Marcello
    JOURNAL OF MACHINE LEARNING RESEARCH, 2021, 22
  • [48] Navigating to the Best Policy in Markov Decision Processes
    Al Marjani, Aymen
    Garivier, Aurelien
    Proutiere, Alexandre
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [49] Policy gradient in Lipschitz Markov Decision Processes
    Pirotta, Matteo
    Restelli, Marcello
    Bascetta, Luca
    MACHINE LEARNING, 2015, 100 (2-3) : 255 - 283
  • [50] Efficient Policy Representation for Markov Decision Processes
    Khademi, Anahita
    Khademian, Sepehr
    SMART TECHNOLOGIES IN URBAN ENGINEERING, STUE-2022, 2023, 536 : 151 - 162