A study of value iteration and policy iteration for Markov decision processes in Deterministic systems

被引:0
|
作者
Zheng, Haifeng [1 ]
Wang, Dan [1 ]
机构
[1] Jinan Univ, Sch Econ, Guangzhou 510632, Guangdong, Peoples R China
来源
AIMS MATHEMATICS | 2024年 / 9卷 / 12期
关键词
Markov decision processes; Deterministic system; value iteration; policy iteration; average cost criterion;
D O I
10.3934/math.20241613
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
In the context of deterministic discrete-time control systems, we examined the implementation of value iteration (VI) and policy (PI) algorithms in Markov decision processes (MDPs) situated within Borel spaces. The deterministic nature of the system's transfer function plays a pivotal role, as the convergence criteria of these algorithms are deeply interconnected with the inherent characteristics of the probability function governing state transitions. For VI, convergence is contingent upon verifying that the cost difference function stabilizes to a constant k ensuring uniformity across iterations. In contrast, PI achieves convergence when the value function maintains consistent values over successive iterations. Finally, a detailed example demonstrates the conditions under which convergence of the algorithm is achieved, underscoring the practicality of these methods in deterministic settings.
引用
收藏
页码:33818 / 33842
页数:25
相关论文
共 50 条
  • [1] Geometric Policy Iteration for Markov Decision Processes
    Wu, Yue
    De Loera, Jesus A.
    PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 2070 - 2078
  • [2] Policy set iteration for Markov decision processes
    Chang, Hyeong Soo
    AUTOMATICA, 2013, 49 (12) : 3687 - 3689
  • [3] Value set iteration for Markov decision processes
    Chang, Hyeong Soo
    AUTOMATICA, 2014, 50 (07) : 1940 - 1943
  • [4] Evolutionary policy iteration for solving Markov decision processes
    Chang, HS
    Lee, HG
    Fu, MC
    Marcus, SI
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2005, 50 (11) : 1804 - 1808
  • [5] Efficient Policy Iteration for Periodic Markov Decision Processes
    Osogami, Takayuki
    Raymond, Rudy
    21ST EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (ECAI 2014), 2014, 263 : 1167 - 1172
  • [6] Policy iteration for robust nonstationary Markov decision processes
    Sinha, Saumya
    Ghate, Archis
    OPTIMIZATION LETTERS, 2016, 10 (08) : 1613 - 1628
  • [7] Policy iteration for robust nonstationary Markov decision processes
    Saumya Sinha
    Archis Ghate
    Optimization Letters, 2016, 10 : 1613 - 1628
  • [8] Policy Iteration for Decentralized Control of Markov Decision Processes
    Bernstein, Daniel S.
    Amato, Christopher
    Hansen, Eric A.
    Zilberstein, Shlomo
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2009, 34 : 89 - 132
  • [9] The Smoothed Complexity of Policy Iteration for Markov Decision Processes
    Christ, Miranda
    Yannakakis, Mihalis
    PROCEEDINGS OF THE 55TH ANNUAL ACM SYMPOSIUM ON THEORY OF COMPUTING, STOC 2023, 2023, : 1890 - 1903
  • [10] Topological Value Iteration Algorithm for Markov Decision Processes
    Dai, Peng
    Goldsmith, Judy
    20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 1860 - 1865