Approximate Policy Iteration for Markov Decision Processes via Quantitative Adaptive Aggregations

被引：1

作者：

Abate, Alessandro ^{[1
]}

Ceska, Milan ^{[1
,2
]}

Kwiatkowska, Marta ^{[1
]}

机构：

[1] Univ Oxford, Dept Comp Sci, Oxford, England

[2] Brno Univ Technol, Fac Informat Technol, CS-61090 Brno, Czech Republic

来源：

AUTOMATED TECHNOLOGY FOR VERIFICATION AND ANALYSIS, ATVA 2016 | 2016年 / 9938卷

基金：

英国工程与自然科学研究理事会;

关键词：

ABSTRACTION;

D O I：

10.1007/978-3-319-46520-3_2

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We consider the problem of finding an optimal policy in a Markov decision process that maximises the expected discounted sum of rewards over an infinite time horizon. Since the explicit iterative dynamical programming scheme does not scale when increasing the dimension of the state space, a number of approximate methods have been developed. These are typically based on value or policy iteration, enabling further speedups through lumped and distributed updates, or by employing succinct representations of the value functions. However, none of the existing approximate techniques provides general, explicit and tunable bounds on the approximation error, a problem particularly relevant when the level of accuracy affects the optimality of the policy. In this paper we propose a new approximate policy iteration scheme that mitigates the state-space explosion problem by adaptive state-space aggregation, at the same time providing rigorous and explicit error bounds that can be used to control the optimality level of the obtained policy. We evaluate the new approach on a case study, demonstrating evidence that the state-space reduction results in considerable acceleration of the policy iteration scheme, while being able to meet the required level of precision.

引用

页码：13 / 31

页数：19

共 50 条

[31] Potential-based online policy iteration algorithms for Markov decision processes
Fang, HT
Cao, XR
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2004, 49 (04) : 493 - 505
[32] Robust topological policy iteration for infinite horizon bounded Markov Decision Processes
Silva Reis, Willy Arthur
de Barros, Leliane Nunes
Delgado, Karina Valdivia
INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2019, 105 : 287 - 304
[33] Approximate robust policy iteration for discounted infinite-horizon Markov decision processes with uncertain stationary parametric tiransition matrices
Li, Baohua
Si, Jennie
2007 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-6, 2007, : 2052 - 2057
[34] Adaptive Sampling for Best Policy Identification in Markov Decision Processes
Al Marjani, Aymen
Proutiere, Alexandre
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[35] Approximate equivalence of Markov decision processes
Even-Dar, E
Mansour, Y
LEARNING THEORY AND KERNEL MACHINES, 2003, 2777 : 581 - 594
[36] Value set iteration for Markov decision processes
Chang, Hyeong Soo
AUTOMATICA, 2014, 50 (07) : 1940 - 1943
[37] Verification of General Markov Decision Processes by Approximate Similarity Relations and Policy Refinement
Haesaert, Sofie
Abate, Alessandro
Van den Hof, Paul M. J.
QUANTITATIVE EVALUATION OF SYSTEMS, QEST 2016, 2016, 9826 : 227 - 243
[38] VERIFICATION OF GENERAL MARKOV DECISION PROCESSES BY APPROXIMATE SIMILARITY RELATIONS AND POLICY REFINEMENT
Haesaert, Sofie
Soudjani, Sadegh Esmaeil Zadeh
Abate, Alessandro
SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2017, 55 (04) : 2333 - 2367
[39] Temporal logic control of general Markov decision processes by approximate policy refinement
Haesaert, Sofie
Soudjani, Sadegh
Abate, Alessandro
IFAC PAPERSONLINE, 2018, 51 (16): : 73 - 78
[40] A NEW POLICY ITERATION SCHEME FOR MARKOV DECISION-PROCESSES USING SCHWEITZER FORMULA
LASSERRE, JB
JOURNAL OF APPLIED PROBABILITY, 1994, 31 (01) : 268 - 273

← 1 2 3 4 5 →