A note on the convergence of policy iteration in Markov decision processes with compact action spaces

被引:1
|
作者
Golubin, AY [1 ]
机构
[1] Moscow Inst Elect & Math, Dept Operat Res, Moscow 109028, Russia
关键词
Markov decision processes; optimality equation; average reward; policy iteration;
D O I
10.1287/moor.28.1.194.14255
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 ; 12 ; 1201 ; 1202 ; 120202 ;
摘要
The undiscounted, unichain, finite state Markov decision process with compact action space is studied. We provide a counterexample for a result in Hordijk and Puterman (1987) and give an alternate proof of the convergence of policy iteration under the condition that there exists a state that is recurrent under every stationary policy. The analysis essentially uses a two-term matrix representation for the relative value vectors generated by policy iteration procedure.
引用
收藏
页码:194 / 200
页数:7
相关论文
共 50 条
  • [1] Geometric Policy Iteration for Markov Decision Processes
    Wu, Yue
    De Loera, Jesus A.
    [J]. PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 2070 - 2078
  • [2] Policy set iteration for Markov decision processes
    Chang, Hyeong Soo
    [J]. AUTOMATICA, 2013, 49 (12) : 3687 - 3689
  • [3] Efficient Policy Iteration for Periodic Markov Decision Processes
    Osogami, Takayuki
    Raymond, Rudy
    [J]. 21ST EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (ECAI 2014), 2014, 263 : 1167 - 1172
  • [4] Evolutionary policy iteration for solving Markov decision processes
    Chang, HS
    Lee, HG
    Fu, MC
    Marcus, SI
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2005, 50 (11) : 1804 - 1808
  • [5] Policy iteration for robust nonstationary Markov decision processes
    Saumya Sinha
    Archis Ghate
    [J]. Optimization Letters, 2016, 10 : 1613 - 1628
  • [6] Policy iteration for robust nonstationary Markov decision processes
    Sinha, Saumya
    Ghate, Archis
    [J]. OPTIMIZATION LETTERS, 2016, 10 (08) : 1613 - 1628
  • [7] Policy Iteration for Decentralized Control of Markov Decision Processes
    Bernstein, Daniel S.
    Amato, Christopher
    Hansen, Eric A.
    Zilberstein, Shlomo
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2009, 34 : 89 - 132
  • [8] The Smoothed Complexity of Policy Iteration for Markov Decision Processes
    Christ, Miranda
    Yannakakis, Mihalis
    [J]. PROCEEDINGS OF THE 55TH ANNUAL ACM SYMPOSIUM ON THEORY OF COMPUTING, STOC 2023, 2023, : 1890 - 1903
  • [9] ON THE CONVERGENCE OF POLICY ITERATION IN FINITE STATE UNDISCOUNTED MARKOV DECISION-PROCESSES - THE UNICHAIN CASE
    HORDIJK, A
    PUTERMAN, ML
    [J]. MATHEMATICS OF OPERATIONS RESEARCH, 1987, 12 (01) : 163 - 176
  • [10] Policy Iteration for Continuous-Time Average Reward Markov Decision Processes in Polish Spaces
    Zhu, Quanxin
    Yang, Xinsong
    Huang, Chuangxia
    [J]. ABSTRACT AND APPLIED ANALYSIS, 2009,