Approximate Policy Iteration for Semi-Markov Control Revisited

被引：1

作者：

Gosavi, Abhijit ^{[1
]}

机构：

[1] Missouri Univ Sci & Technol, Rolla, MO 65409 USA

来源：

COMPLEX ADAPTIVE SYSTEMS | 2011年 / 6卷

关键词：

approximate policy iteration; reinforcement learning; average reward; Semi-Markov; CONVERGENCE;

D O I：

10.1016/j.procs.2011.08.046

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

The semi-Markov decision process can be solved via reinforcement learning without generating its transition model. We briefly review the existing algorithms based on approximate policy iteration (API) for solving this problem for discounted and average reward under the infinite horizon. API techniques have attracted significant interest in the literature recently. We first present and analyze an extension of an existing API algorithm for discounted reward that can handle continuous reward rates. Then, we also consider its average reward counterpart, which requires an updating based on the stochastic shortest path (SSP). We study the convergence properties of the algorithm that does not require the SSP update.

引用

页数：7

共 50 条

[31] Deterministic policy gradient algorithms for semi-Markov decision processes
Hosseinloo, Ashkan Haji
Dahleh, Munther A.
[J]. INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2022, 37 (07) : 4008 - 4019
[32] Optimum maintenance policy with inspection by Semi-Markov Decision Processes
Ge, Haifeng
Tomasevicz, Curtis L.
Asgarpoor, Sohrab
[J]. 2007 39TH NORTH AMERICAN POWER SYMPOSIUM, VOLS 1 AND 2, 2007, : 541 - 546
[33] Optimum maintenance policy using semi-Markov decision processes
Tomasevicz, Curtis L.
Asgarpoor, Sohrab
[J]. 2006 38TH ANNUAL NORTH AMERICAN POWER SYMPOSIUM, NAPS-2006 PROCEEDINGS, 2006, : 23 - +
[34] Attainability for Markov and Semi-Markov Chains
Verbeken, Brecht
Guerry, Marie-Anne
[J]. MATHEMATICS, 2024, 12 (08)
[35] An approximation approach to ergodic semi-Markov control processes
Jaśkiewicz A.
[J]. Mathematical Methods of Operations Research, 2001, 54 (1) : 1 - 19
[36] Optimum maintenance policy using semi-Markov decision processes
Tomasevicz, Curtis L.
Asgarpoor, Sohrab
[J]. ELECTRIC POWER SYSTEMS RESEARCH, 2009, 79 (09) : 1286 - 1291
[37] COMPARISON OF SEMI-MARKOV AND MARKOV PROCESSES
KURTZ, TG
[J]. ANNALS OF MATHEMATICAL STATISTICS, 1971, 42 (03): : 991 - &
[38] An approximation approach to ergodic semi-Markov control processes
Jaskiewicz, A
[J]. MATHEMATICAL METHODS OF OPERATIONS RESEARCH, 2001, 54 (01) : 1 - 19
[39] Adaptive optimisation of timeout policy for dynamic power management based on semi-Markov control processes
Jiang, Q.
Xi, H. -S.
Yin, B. -Q.
[J]. IET CONTROL THEORY AND APPLICATIONS, 2010, 4 (10): : 1945 - 1958
[40] Approximate policy iteration with a policy language bias: Solving relational markov decision processes
Fern, Alan
Yoon, Sungwook
Givan, Robert
[J]. Journal of Artificial Intelligence Research, 1600, 25 : 75 - 118

← 1 2 3 4 5 →