A study of value iteration and policy iteration for Markov decision processes in Deterministic systems

被引:0
|
作者
Zheng, Haifeng [1 ]
Wang, Dan [1 ]
机构
[1] Jinan Univ, Sch Econ, Guangzhou 510632, Guangdong, Peoples R China
来源
AIMS MATHEMATICS | 2024年 / 9卷 / 12期
关键词
Markov decision processes; Deterministic system; value iteration; policy iteration; average cost criterion;
D O I
10.3934/math.20241613
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
In the context of deterministic discrete-time control systems, we examined the implementation of value iteration (VI) and policy (PI) algorithms in Markov decision processes (MDPs) situated within Borel spaces. The deterministic nature of the system's transfer function plays a pivotal role, as the convergence criteria of these algorithms are deeply interconnected with the inherent characteristics of the probability function governing state transitions. For VI, convergence is contingent upon verifying that the cost difference function stabilizes to a constant k ensuring uniformity across iterations. In contrast, PI achieves convergence when the value function maintains consistent values over successive iterations. Finally, a detailed example demonstrates the conditions under which convergence of the algorithm is achieved, underscoring the practicality of these methods in deterministic settings.
引用
收藏
页码:33818 / 33842
页数:25
相关论文
共 50 条
  • [41] Inexact GMRES Policy Iteration for Large-Scale Markov Decision Processes
    Gargiani, Matilde
    Liao-McPherson, Dominic
    Zanelli, Andrea
    Lygeros, John
    IFAC PAPERSONLINE, 2023, 56 (02): : 11249 - 11254
  • [42] A pause control approach to the value iteration scheme in average Markov decision processes
    Cavazos-Cadena, Rolando
    Systems and Control Letters, 1998, 33 (04): : 209 - 219
  • [43] IntervalMDP. jl: Accelerated Value Iteration for Interval Markov Decision Processes
    Mathiesen, Frederik Baymler
    Lahijanian, Morteza
    Laurenti, Luca
    IFAC PAPERSONLINE, 2024, 58 (11): : 1 - 6
  • [45] Potential-based online policy iteration algorithms for Markov decision processes
    Fang, HT
    Cao, XR
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2004, 49 (04) : 493 - 505
  • [46] Robust topological policy iteration for infinite horizon bounded Markov Decision Processes
    Silva Reis, Willy Arthur
    de Barros, Leliane Nunes
    Delgado, Karina Valdivia
    INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2019, 105 : 287 - 304
  • [47] Variance reduced value iteration and faster algorithms for solving Markov decision processes
    Sidford, Aaron
    Wang, Mengdi
    Wu, Xian
    Ye, Yinyu
    NAVAL RESEARCH LOGISTICS, 2023, 70 (05) : 423 - 442
  • [48] A pause control approach to the value iteration scheme in average Markov decision processes
    Cavazos-Cadena, R
    SYSTEMS & CONTROL LETTERS, 1998, 33 (04) : 209 - 219
  • [49] A method for speeding up value iteration in partially observable Markov decision processes
    Zhang, NL
    Lee, SS
    Zhang, WH
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 1999, : 696 - 703
  • [50] ISOTONE POLICIES FOR THE VALUE-ITERATION METHOD FOR MARKOV DECISION-PROCESSES
    WHITE, DJ
    OR SPEKTRUM, 1984, 6 (04) : 223 - 227