Approximate Policy Iteration for Markov Decision Processes via Quantitative Adaptive Aggregations

被引:1
|
作者
Abate, Alessandro [1 ]
Ceska, Milan [1 ,2 ]
Kwiatkowska, Marta [1 ]
机构
[1] Univ Oxford, Dept Comp Sci, Oxford, England
[2] Brno Univ Technol, Fac Informat Technol, CS-61090 Brno, Czech Republic
基金
英国工程与自然科学研究理事会;
关键词
ABSTRACTION;
D O I
10.1007/978-3-319-46520-3_2
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We consider the problem of finding an optimal policy in a Markov decision process that maximises the expected discounted sum of rewards over an infinite time horizon. Since the explicit iterative dynamical programming scheme does not scale when increasing the dimension of the state space, a number of approximate methods have been developed. These are typically based on value or policy iteration, enabling further speedups through lumped and distributed updates, or by employing succinct representations of the value functions. However, none of the existing approximate techniques provides general, explicit and tunable bounds on the approximation error, a problem particularly relevant when the level of accuracy affects the optimality of the policy. In this paper we propose a new approximate policy iteration scheme that mitigates the state-space explosion problem by adaptive state-space aggregation, at the same time providing rigorous and explicit error bounds that can be used to control the optimality level of the obtained policy. We evaluate the new approach on a case study, demonstrating evidence that the state-space reduction results in considerable acceleration of the policy iteration scheme, while being able to meet the required level of precision.
引用
收藏
页码:13 / 31
页数:19
相关论文
共 50 条
  • [41] The policy iteration algorithm for average reward Markov decision processes with general state space
    Meyn, SP
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1997, 42 (12) : 1663 - 1680
  • [42] Average optimality for continuous-time Markov decision processes with a policy iteration approach
    Zhu, Quanxin
    JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 2008, 339 (01) : 691 - 704
  • [43] Approximate Policy Iteration for Semi-Markov Control Revisited
    Gosavi, Abhijit
    COMPLEX ADAPTIVE SYSTEMS, 2011, 6
  • [44] Topological Value Iteration Algorithm for Markov Decision Processes
    Dai, Peng
    Goldsmith, Judy
    20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 1860 - 1865
  • [45] New prioritized value iteration for Markov decision processes
    de Guadalupe Garcia-Hernandez, Ma.
    Ruiz-Pinales, Jose
    Onaindia, Eva
    Gabriel Avina-Cervantes, J.
    Ledesma-Orozco, Sergio
    Alvarado-Mendez, Edgar
    Reyes-Ballesteros, Alberto
    ARTIFICIAL INTELLIGENCE REVIEW, 2012, 37 (02) : 157 - 167
  • [46] New prioritized value iteration for Markov decision processes
    Ma. de Guadalupe Garcia-Hernandez
    Jose Ruiz-Pinales
    Eva Onaindia
    J. Gabriel Aviña-Cervantes
    Sergio Ledesma-Orozco
    Edgar Alvarado-Mendez
    Alberto Reyes-Ballesteros
    Artificial Intelligence Review, 2012, 37 : 157 - 167
  • [47] Policy Iteration for Continuous-Time Average Reward Markov Decision Processes in Polish Spaces
    Zhu, Quanxin
    Yang, Xinsong
    Huang, Chuangxia
    ABSTRACT AND APPLIED ANALYSIS, 2009,
  • [48] ON THE CONVERGENCE OF POLICY ITERATION IN FINITE STATE UNDISCOUNTED MARKOV DECISION-PROCESSES - THE UNICHAIN CASE
    HORDIJK, A
    PUTERMAN, ML
    MATHEMATICS OF OPERATIONS RESEARCH, 1987, 12 (01) : 163 - 176
  • [49] Mean Field Approximation of the Policy Iteration Algorithm for Graph-based Markov Decision Processes
    Peyrard, Nathalie
    Sabbadin, Regis
    ECAI 2006, PROCEEDINGS, 2006, 141 : 595 - +
  • [50] Quantitative Programming and Markov Decision Processes
    Todoran, Eneia Nicolae
    2022 24TH INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING, SYNASC, 2022, : 117 - 124