Approximate Policy Iteration for Markov Decision Processes via Quantitative Adaptive Aggregations

被引:1
|
作者
Abate, Alessandro [1 ]
Ceska, Milan [1 ,2 ]
Kwiatkowska, Marta [1 ]
机构
[1] Univ Oxford, Dept Comp Sci, Oxford, England
[2] Brno Univ Technol, Fac Informat Technol, CS-61090 Brno, Czech Republic
基金
英国工程与自然科学研究理事会;
关键词
ABSTRACTION;
D O I
10.1007/978-3-319-46520-3_2
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We consider the problem of finding an optimal policy in a Markov decision process that maximises the expected discounted sum of rewards over an infinite time horizon. Since the explicit iterative dynamical programming scheme does not scale when increasing the dimension of the state space, a number of approximate methods have been developed. These are typically based on value or policy iteration, enabling further speedups through lumped and distributed updates, or by employing succinct representations of the value functions. However, none of the existing approximate techniques provides general, explicit and tunable bounds on the approximation error, a problem particularly relevant when the level of accuracy affects the optimality of the policy. In this paper we propose a new approximate policy iteration scheme that mitigates the state-space explosion problem by adaptive state-space aggregation, at the same time providing rigorous and explicit error bounds that can be used to control the optimality level of the obtained policy. We evaluate the new approach on a case study, demonstrating evidence that the state-space reduction results in considerable acceleration of the policy iteration scheme, while being able to meet the required level of precision.
引用
收藏
页码:13 / 31
页数:19
相关论文
共 50 条
  • [1] Approximate policy iteration with a policy language bias: Solving relational markov decision processes
    Fern, Alan
    Yoon, Sungwook
    Givan, Robert
    [J]. Journal of Artificial Intelligence Research, 1600, 25 : 75 - 118
  • [2] Approximate policy iteration with a policy language bias: Solving relational Markov decision processes
    Fern, A
    Yoon, S
    Givan, R
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2006, 25 : 75 - 118
  • [3] Geometric Policy Iteration for Markov Decision Processes
    Wu, Yue
    De Loera, Jesus A.
    [J]. PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 2070 - 2078
  • [4] Policy set iteration for Markov decision processes
    Chang, Hyeong Soo
    [J]. AUTOMATICA, 2013, 49 (12) : 3687 - 3689
  • [5] Evolutionary policy iteration for solving Markov decision processes
    Chang, HS
    Lee, HG
    Fu, MC
    Marcus, SI
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2005, 50 (11) : 1804 - 1808
  • [6] Efficient Policy Iteration for Periodic Markov Decision Processes
    Osogami, Takayuki
    Raymond, Rudy
    [J]. 21ST EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (ECAI 2014), 2014, 263 : 1167 - 1172
  • [7] Policy iteration for robust nonstationary Markov decision processes
    Saumya Sinha
    Archis Ghate
    [J]. Optimization Letters, 2016, 10 : 1613 - 1628
  • [8] Policy iteration for robust nonstationary Markov decision processes
    Sinha, Saumya
    Ghate, Archis
    [J]. OPTIMIZATION LETTERS, 2016, 10 (08) : 1613 - 1628
  • [9] The Smoothed Complexity of Policy Iteration for Markov Decision Processes
    Christ, Miranda
    Yannakakis, Mihalis
    [J]. PROCEEDINGS OF THE 55TH ANNUAL ACM SYMPOSIUM ON THEORY OF COMPUTING, STOC 2023, 2023, : 1890 - 1903
  • [10] Policy Iteration for Decentralized Control of Markov Decision Processes
    Bernstein, Daniel S.
    Amato, Christopher
    Hansen, Eric A.
    Zilberstein, Shlomo
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2009, 34 : 89 - 132