Distributional Reachability for Markov Decision Processes: Theory and Applications

被引：0

作者：

Gao, Yulong ^{[1
]}

Abate, Alessandro ^{[2
]}

Xie, Lihua ^{[3
]}

Johansson, Karl Henrik ^{[4
]}

机构：

[1] Imperial Coll London, Dept Elect & Elect Engn, London SW7 2AZ, England

[2] Univ Oxford, Dept Comp Sci, Oxford OX1 3QD, England

[3] Nanyang Technol Univ, Sch Elect & Elect Engn, Nanyang 639798, Singapore

[4] KTH Royal Inst Technol & Digital Futures, Div Decis & Control Syst, Stockholm, Sweden

来源：

IEEE TRANSACTIONS ON AUTOMATIC CONTROL | 2024年 / 69卷 / 07期

基金：

瑞典研究理事会;

关键词：

Distributional reachability; Markov decision processes (MDPs); probabilistic reachability; reach-avoid problems; set invariance; ALGORITHM;

D O I：

10.1109/TAC.2023.3341282

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We study distributional reachability for finite Markov decision processes (MDPs) from a control theoretical perspective. Unlike standard probabilistic reachability notions, which are defined over MDP states or trajectories, in this article reachability is formulated over the space of probability distributions. We propose two set-valued maps for the forward and backward distributional reachability problems: the forward map collects all state distributions that can be reached from a set of initial distributions, while the backward map collects all state distributions that can reach a set of final distributions. We show that there exists a maximal invariant set under the forward map and this set is the region where the state distributions eventually always belong to, regardless of the initial state distribution and policy. The backward map provides an alternative way to solve a class of important problems for MDPs: the study of controlled invariance, the characterization of the domain of attraction, and reach-avoid problems. Three case studies illustrate the effectiveness of our approach.

引用

页码：4598 / 4613

页数：16

共 50 条

[31] Reachability questions in piecewise deterministic Markov processes
Bujorianu, ML
Lygeros, J
[J]. HYBRID SYSTEMS: COMPUTATION AND CONTROL, PROCEEDINGS, 2003, 2623 : 126 - 140
[32] Maximal Cost-Bounded Reachability Probability on Continuous-Time Markov Decision Processes
Fu, Hongfei
[J]. FOUNDATIONS OF SOFTWARE SCIENCE AND COMPUTATION STRUCTURES, 2014, 8412 : 73 - 87
[33] ELEMENTS OF THE THEORY OF MARKOV-PROCESSES AND THEIR APPLICATIONS - BHARUCHAREID,AT
ROPPERT, J
[J]. METRIKA, 1962, 5 (03) : 227 - 228
[34] ELEMENTS OF THE THEORY OF MARKOV-PROCESSES AND THEIR APPLICATIONS - BHARUCHAREID,AT
GILBERT, WM
[J]. ECONOMETRICA, 1964, 32 (03) : 457 - 457
[35] Semi-Markov Processes: Theory and Applications - Preface
Limnios, N
[J]. COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2004, 33 (03) : XIII - XIV
[36] ELEMENTS OF THE THEORY OF MARKOV-PROCESSES AND THEIR APPLICATIONS - BHARUCHAREID,AT
不详
[J]. POPULATION, 1962, 17 (01): : 181 - 181
[37] A Partitioning Algorithm for Markov Decision Processes with Applications to Market Microstructure
Chen, Ningyuan
Kou, Steven
Wang, Chun
[J]. MANAGEMENT SCIENCE, 2018, 64 (02) : 784 - 803
[38] An envelope theorem and some applications to discounted Markov decision processes
Cruz-Suarez, Hugo
Montes-de-Oca, Raul
[J]. MATHEMATICAL METHODS OF OPERATIONS RESEARCH, 2008, 67 (02) : 299 - 321
[39] Markov Decision Processes With Applications in Wireless Sensor Networks: A Survey
Abu Alsheikh, Mohammad
Dinh Thai Hoang
Niyato, Dusit
Tan, Hwee-Pink
Lin, Shaowei
[J]. IEEE COMMUNICATIONS SURVEYS AND TUTORIALS, 2015, 17 (03): : 1239 - 1267
[40] PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES AND PERIODIC POLICIES WITH APPLICATIONS
Goulionis, John
Stengos, D.
[J]. INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & DECISION MAKING, 2011, 10 (06) : 1175 - 1197

← 1 2 3 4 5 →