A study of value iteration and policy iteration for Markov decision processes in Deterministic systems

被引：0

作者：

Zheng, Haifeng ^{[1
]}

Wang, Dan ^{[1
]}

机构：

[1] Jinan Univ, Sch Econ, Guangzhou 510632, Guangdong, Peoples R China

来源：

AIMS MATHEMATICS | 2024年 / 9卷 / 12期

关键词：

Markov decision processes; Deterministic system; value iteration; policy iteration; average cost criterion;

D O I：

10.3934/math.20241613

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

In the context of deterministic discrete-time control systems, we examined the implementation of value iteration (VI) and policy (PI) algorithms in Markov decision processes (MDPs) situated within Borel spaces. The deterministic nature of the system's transfer function plays a pivotal role, as the convergence criteria of these algorithms are deeply interconnected with the inherent characteristics of the probability function governing state transitions. For VI, convergence is contingent upon verifying that the cost difference function stabilizes to a constant k ensuring uniformity across iterations. In contrast, PI achieves convergence when the value function maintains consistent values over successive iterations. Finally, a detailed example demonstrates the conditions under which convergence of the algorithm is achieved, underscoring the practicality of these methods in deterministic settings.

引用

页码：33818 / 33842

页数：25

共 50 条

[1] Geometric Policy Iteration for Markov Decision Processes
Wu, Yue
De Loera, Jesus A.
PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 2070 - 2078
[2] Policy set iteration for Markov decision processes
Chang, Hyeong Soo
AUTOMATICA, 2013, 49 (12) : 3687 - 3689
[3] Value set iteration for Markov decision processes
Chang, Hyeong Soo
AUTOMATICA, 2014, 50 (07) : 1940 - 1943
[4] Evolutionary policy iteration for solving Markov decision processes
Chang, HS
Lee, HG
Fu, MC
Marcus, SI
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2005, 50 (11) : 1804 - 1808
[5] Efficient Policy Iteration for Periodic Markov Decision Processes
Osogami, Takayuki
Raymond, Rudy
21ST EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (ECAI 2014), 2014, 263 : 1167 - 1172
[6] Policy iteration for robust nonstationary Markov decision processes
Sinha, Saumya
Ghate, Archis
OPTIMIZATION LETTERS, 2016, 10 (08) : 1613 - 1628
[7] Policy iteration for robust nonstationary Markov decision processes
Saumya Sinha
Archis Ghate
Optimization Letters, 2016, 10 : 1613 - 1628
[8] Policy Iteration for Decentralized Control of Markov Decision Processes
Bernstein, Daniel S.
Amato, Christopher
Hansen, Eric A.
Zilberstein, Shlomo
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2009, 34 : 89 - 132
[9] The Smoothed Complexity of Policy Iteration for Markov Decision Processes
Christ, Miranda
Yannakakis, Mihalis
PROCEEDINGS OF THE 55TH ANNUAL ACM SYMPOSIUM ON THEORY OF COMPUTING, STOC 2023, 2023, : 1890 - 1903
[10] Topological Value Iteration Algorithm for Markov Decision Processes
Dai, Peng
Goldsmith, Judy
20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 1860 - 1865

← 1 2 3 4 5 →