Approximate Policy Iteration for Markov Decision Processes via Quantitative Adaptive Aggregations

被引:1
|
作者
Abate, Alessandro [1 ]
Ceska, Milan [1 ,2 ]
Kwiatkowska, Marta [1 ]
机构
[1] Univ Oxford, Dept Comp Sci, Oxford, England
[2] Brno Univ Technol, Fac Informat Technol, CS-61090 Brno, Czech Republic
基金
英国工程与自然科学研究理事会;
关键词
ABSTRACTION;
D O I
10.1007/978-3-319-46520-3_2
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We consider the problem of finding an optimal policy in a Markov decision process that maximises the expected discounted sum of rewards over an infinite time horizon. Since the explicit iterative dynamical programming scheme does not scale when increasing the dimension of the state space, a number of approximate methods have been developed. These are typically based on value or policy iteration, enabling further speedups through lumped and distributed updates, or by employing succinct representations of the value functions. However, none of the existing approximate techniques provides general, explicit and tunable bounds on the approximation error, a problem particularly relevant when the level of accuracy affects the optimality of the policy. In this paper we propose a new approximate policy iteration scheme that mitigates the state-space explosion problem by adaptive state-space aggregation, at the same time providing rigorous and explicit error bounds that can be used to control the optimality level of the obtained policy. We evaluate the new approach on a case study, demonstrating evidence that the state-space reduction results in considerable acceleration of the policy iteration scheme, while being able to meet the required level of precision.
引用
收藏
页码:13 / 31
页数:19
相关论文
共 50 条
  • [21] Approximate Newton Methods for Policy Search in Markov Decision Processes
    Furmston, Thomas
    Lever, Guy
    Barber, David
    JOURNAL OF MACHINE LEARNING RESEARCH, 2016, 17
  • [22] Softened Approximate Policy Iteration for Markov Games
    Perolat, Julien
    Piot, Bilal
    Geist, Matthieu
    Scherrer, Bruno
    Pietquin, Olivier
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [23] Approximate Policy Iteration for Markov Control Revisited
    Gosavi, Abhijit
    COMPLEX ADAPTIVE SYSTEMS 2012, 2012, 12 : 90 - 95
  • [24] Policy iteration type algorithms for recurrent state Markov decision processes
    Patek, SD
    COMPUTERS & OPERATIONS RESEARCH, 2004, 31 (14) : 2333 - 2347
  • [25] COMPUTATIONAL COMPARISON OF POLICY ITERATION ALGORITHMS FOR DISCOUNTED MARKOV DECISION-PROCESSES
    HARTLEY, R
    LAVERCOMBE, AC
    THOMAS, LC
    COMPUTERS & OPERATIONS RESEARCH, 1986, 13 (04) : 411 - 420
  • [26] Partial policy iteration for L1-Robust Markov decision processes
    Ho, Chin Pang
    Petrik, Marek
    Wiesemann, Wolfram
    Journal of Machine Learning Research, 2021, 22
  • [27] Cosine Policy Iteration for Solving Infinite-Horizon Markov Decision Processes
    Frausto-Solis, Juan
    Santiago, Elizabeth
    Mora-Vargas, Jaime
    MICAI 2009: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2009, 5845 : 75 - +
  • [28] COMPUTATIONAL COMPARISON OF POLICY ITERATION ALGORITHMS FOR DISCOUNTED MARKOV DECISION PROCESSES.
    Hartley, R.
    Lavercombe, A.C.
    Thomas, L.C.
    1600, (13):
  • [29] A note on the convergence of policy iteration in Markov decision processes with compact action spaces
    Golubin, AY
    MATHEMATICS OF OPERATIONS RESEARCH, 2003, 28 (01) : 194 - 200
  • [30] Inexact GMRES Policy Iteration for Large-Scale Markov Decision Processes
    Gargiani, Matilde
    Liao-McPherson, Dominic
    Zanelli, Andrea
    Lygeros, John
    IFAC PAPERSONLINE, 2023, 56 (02): : 11249 - 11254