Approximate Policy Iteration for Semi-Markov Control Revisited

被引:1
|
作者
Gosavi, Abhijit [1 ]
机构
[1] Missouri Univ Sci & Technol, Rolla, MO 65409 USA
来源
关键词
approximate policy iteration; reinforcement learning; average reward; Semi-Markov; CONVERGENCE;
D O I
10.1016/j.procs.2011.08.046
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The semi-Markov decision process can be solved via reinforcement learning without generating its transition model. We briefly review the existing algorithms based on approximate policy iteration (API) for solving this problem for discounted and average reward under the infinite horizon. API techniques have attracted significant interest in the literature recently. We first present and analyze an extension of an existing API algorithm for discounted reward that can handle continuous reward rates. Then, we also consider its average reward counterpart, which requires an updating based on the stochastic shortest path (SSP). We study the convergence properties of the algorithm that does not require the SSP update.
引用
收藏
页数:7
相关论文
共 50 条
  • [31] Deterministic policy gradient algorithms for semi-Markov decision processes
    Hosseinloo, Ashkan Haji
    Dahleh, Munther A.
    [J]. INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2022, 37 (07) : 4008 - 4019
  • [32] Optimum maintenance policy with inspection by Semi-Markov Decision Processes
    Ge, Haifeng
    Tomasevicz, Curtis L.
    Asgarpoor, Sohrab
    [J]. 2007 39TH NORTH AMERICAN POWER SYMPOSIUM, VOLS 1 AND 2, 2007, : 541 - 546
  • [33] Optimum maintenance policy using semi-Markov decision processes
    Tomasevicz, Curtis L.
    Asgarpoor, Sohrab
    [J]. 2006 38TH ANNUAL NORTH AMERICAN POWER SYMPOSIUM, NAPS-2006 PROCEEDINGS, 2006, : 23 - +
  • [34] Attainability for Markov and Semi-Markov Chains
    Verbeken, Brecht
    Guerry, Marie-Anne
    [J]. MATHEMATICS, 2024, 12 (08)
  • [35] An approximation approach to ergodic semi-Markov control processes
    Jaśkiewicz A.
    [J]. Mathematical Methods of Operations Research, 2001, 54 (1) : 1 - 19
  • [36] Optimum maintenance policy using semi-Markov decision processes
    Tomasevicz, Curtis L.
    Asgarpoor, Sohrab
    [J]. ELECTRIC POWER SYSTEMS RESEARCH, 2009, 79 (09) : 1286 - 1291
  • [37] COMPARISON OF SEMI-MARKOV AND MARKOV PROCESSES
    KURTZ, TG
    [J]. ANNALS OF MATHEMATICAL STATISTICS, 1971, 42 (03): : 991 - &
  • [38] An approximation approach to ergodic semi-Markov control processes
    Jaskiewicz, A
    [J]. MATHEMATICAL METHODS OF OPERATIONS RESEARCH, 2001, 54 (01) : 1 - 19
  • [39] Adaptive optimisation of timeout policy for dynamic power management based on semi-Markov control processes
    Jiang, Q.
    Xi, H. -S.
    Yin, B. -Q.
    [J]. IET CONTROL THEORY AND APPLICATIONS, 2010, 4 (10): : 1945 - 1958
  • [40] Approximate policy iteration with a policy language bias: Solving relational markov decision processes
    Fern, Alan
    Yoon, Sungwook
    Givan, Robert
    [J]. Journal of Artificial Intelligence Research, 1600, 25 : 75 - 118