A study of value iteration and policy iteration for Markov decision processes in Deterministic systems

被引:0
|
作者
Zheng, Haifeng [1 ]
Wang, Dan [1 ]
机构
[1] Jinan Univ, Sch Econ, Guangzhou 510632, Guangdong, Peoples R China
来源
AIMS MATHEMATICS | 2024年 / 9卷 / 12期
关键词
Markov decision processes; Deterministic system; value iteration; policy iteration; average cost criterion;
D O I
10.3934/math.20241613
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
In the context of deterministic discrete-time control systems, we examined the implementation of value iteration (VI) and policy (PI) algorithms in Markov decision processes (MDPs) situated within Borel spaces. The deterministic nature of the system's transfer function plays a pivotal role, as the convergence criteria of these algorithms are deeply interconnected with the inherent characteristics of the probability function governing state transitions. For VI, convergence is contingent upon verifying that the cost difference function stabilizes to a constant k ensuring uniformity across iterations. In contrast, PI achieves convergence when the value function maintains consistent values over successive iterations. Finally, a detailed example demonstrates the conditions under which convergence of the algorithm is achieved, underscoring the practicality of these methods in deterministic settings.
引用
收藏
页码:33818 / 33842
页数:25
相关论文
共 50 条
  • [31] Generalized Second-Order Value Iteration in Markov Decision Processes
    Kamanchi, Chandramouli
    Diddigi, Raghuram Bharadwaj
    Bhatnagar, Shalabh
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2022, 67 (08) : 4241 - 4247
  • [32] ASYNCHRONOUS VALUE ITERATION FOR MARKOV DECISION PROCESSES WITH CONTINUOUS STATE SPACES
    Yang, Xiangyu
    Hu, Jian-Qiang
    Hu, Jiaqiao
    Peng, Yijie
    2020 WINTER SIMULATION CONFERENCE (WSC), 2020, : 2856 - 2866
  • [33] Approximate policy iteration with a policy language bias: Solving relational markov decision processes
    Fern, Alan
    Yoon, Sungwook
    Givan, Robert
    Journal of Artificial Intelligence Research, 1600, 25 : 75 - 118
  • [34] Approximate policy iteration with a policy language bias: Solving relational Markov decision processes
    Fern, A
    Yoon, S
    Givan, R
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2006, 25 : 75 - 118
  • [35] Approximate Policy Iteration for Markov Decision Processes via Quantitative Adaptive Aggregations
    Abate, Alessandro
    Ceska, Milan
    Kwiatkowska, Marta
    AUTOMATED TECHNOLOGY FOR VERIFICATION AND ANALYSIS, ATVA 2016, 2016, 9938 : 13 - 31
  • [36] COMPUTATIONAL COMPARISON OF POLICY ITERATION ALGORITHMS FOR DISCOUNTED MARKOV DECISION-PROCESSES
    HARTLEY, R
    LAVERCOMBE, AC
    THOMAS, LC
    COMPUTERS & OPERATIONS RESEARCH, 1986, 13 (04) : 411 - 420
  • [37] Partial policy iteration for L1-Robust Markov decision processes
    Ho, Chin Pang
    Petrik, Marek
    Wiesemann, Wolfram
    Journal of Machine Learning Research, 2021, 22
  • [38] Cosine Policy Iteration for Solving Infinite-Horizon Markov Decision Processes
    Frausto-Solis, Juan
    Santiago, Elizabeth
    Mora-Vargas, Jaime
    MICAI 2009: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2009, 5845 : 75 - +
  • [39] COMPUTATIONAL COMPARISON OF POLICY ITERATION ALGORITHMS FOR DISCOUNTED MARKOV DECISION PROCESSES.
    Hartley, R.
    Lavercombe, A.C.
    Thomas, L.C.
    1600, (13):
  • [40] A note on the convergence of policy iteration in Markov decision processes with compact action spaces
    Golubin, AY
    MATHEMATICS OF OPERATIONS RESEARCH, 2003, 28 (01) : 194 - 200