A study of value iteration and policy iteration for Markov decision processes in Deterministic systems

被引：0

作者：

Zheng, Haifeng ^{[1
]}

Wang, Dan ^{[1
]}

机构：

[1] Jinan Univ, Sch Econ, Guangzhou 510632, Guangdong, Peoples R China

来源：

AIMS MATHEMATICS | 2024年 / 9卷 / 12期

关键词：

Markov decision processes; Deterministic system; value iteration; policy iteration; average cost criterion;

D O I：

10.3934/math.20241613

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

In the context of deterministic discrete-time control systems, we examined the implementation of value iteration (VI) and policy (PI) algorithms in Markov decision processes (MDPs) situated within Borel spaces. The deterministic nature of the system's transfer function plays a pivotal role, as the convergence criteria of these algorithms are deeply interconnected with the inherent characteristics of the probability function governing state transitions. For VI, convergence is contingent upon verifying that the cost difference function stabilizes to a constant k ensuring uniformity across iterations. In contrast, PI achieves convergence when the value function maintains consistent values over successive iterations. Finally, a detailed example demonstrates the conditions under which convergence of the algorithm is achieved, underscoring the practicality of these methods in deterministic settings.

引用

页码：33818 / 33842

页数：25

共 50 条

[21] The Policy Iteration Algorithm for Average Continuous Control of Piecewise Deterministic Markov Processes
Costa, O. L. V.
Dufour, F.
APPLIED MATHEMATICS AND OPTIMIZATION, 2010, 62 (02): : 185 - 204
[22] The Policy Iteration Algorithm for Average Continuous Control of Piecewise Deterministic Markov Processes
Costa, O. L. V.
Dufour, F.
PROCEEDINGS OF THE 48TH IEEE CONFERENCE ON DECISION AND CONTROL, 2009 HELD JOINTLY WITH THE 2009 28TH CHINESE CONTROL CONFERENCE (CDC/CCC 2009), 2009, : 506 - 511
[23] The Policy Iteration Algorithm for Average Continuous Control of Piecewise Deterministic Markov Processes
O. L. V. Costa
F. Dufour
Applied Mathematics & Optimization, 2010, 62 : 185 - 204
[24] Policy iteration type algorithms for recurrent state Markov decision processes
Patek, SD
COMPUTERS & OPERATIONS RESEARCH, 2004, 31 (14) : 2333 - 2347
[25] ON CONVERGENCE OF VALUE ITERATION FOR A CLASS OF TOTAL COST MARKOV DECISION PROCESSES
Yu, Huizhen
SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2015, 53 (04) : 1982 - 2016
[26] Advantage Based Value Iteration for Markov Decision Processes with Unknown Rewards
Alizadeh, Pegah
Chevaleyre, Yann
Levy, Francois
2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 3837 - 3844
[27] Uniform convergence of value iteration policies for discounted Markov decision processes
Cruz-Suarez, Daniel
Montes-De-Oca, Raul
BOLETIN DE LA SOCIEDAD MATEMATICA MEXICANA, 2006, 12 (01): : 133 - 148
[28] Value Iteration for Average Cost Markov Decision Processes in Borel Spaces
Zhu, Quanxin
Guo, Xianping
APPLIED MATHEMATICS RESEARCH EXPRESS, 2005, (02) : 61 - 76
[29] THE CONVERGENCE OF VALUE-ITERATION IN DISCOUNTED MARKOV DECISION-PROCESSES
WHITE, DJ
SCHERER, WT
JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 1994, 182 (02) : 348 - 360
[30] Approximate Value Iteration for Risk-Aware Markov Decision Processes
Yu, Pengqian
Haskell, William B.
Xu, Huan
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2018, 63 (09) : 3135 - 3142

← 1 2 3 4 5 →