Approximate Policy Iteration for Semi-Markov Control Revisited

被引:1
|
作者
Gosavi, Abhijit [1 ]
机构
[1] Missouri Univ Sci & Technol, Rolla, MO 65409 USA
来源
关键词
approximate policy iteration; reinforcement learning; average reward; Semi-Markov; CONVERGENCE;
D O I
10.1016/j.procs.2011.08.046
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The semi-Markov decision process can be solved via reinforcement learning without generating its transition model. We briefly review the existing algorithms based on approximate policy iteration (API) for solving this problem for discounted and average reward under the infinite horizon. API techniques have attracted significant interest in the literature recently. We first present and analyze an extension of an existing API algorithm for discounted reward that can handle continuous reward rates. Then, we also consider its average reward counterpart, which requires an updating based on the stochastic shortest path (SSP). We study the convergence properties of the algorithm that does not require the SSP update.
引用
收藏
页数:7
相关论文
共 50 条
  • [1] Approximate Policy Iteration for Markov Control Revisited
    Gosavi, Abhijit
    [J]. COMPLEX ADAPTIVE SYSTEMS 2012, 2012, 12 : 90 - 95
  • [2] AN IMPROVED POLICY ITERATION ALGORITHM FOR SEMI-MARKOV MAINTENANCE PROBLEMS
    VALDEZFLORES, C
    FELDMAN, RM
    [J]. IIE TRANSACTIONS, 1992, 24 (01) : 55 - 63
  • [3] Online Policy Iteration Algorithm for Semi-Markov Switching State-Space Control Processes
    Jiang, Qi
    Xi, Hong-Sheng
    Yin, Bao-Qin
    [J]. PROCEEDINGS OF THE 48TH IEEE CONFERENCE ON DECISION AND CONTROL, 2009 HELD JOINTLY WITH THE 2009 28TH CHINESE CONTROL CONFERENCE (CDC/CCC 2009), 2009, : 2298 - 2303
  • [4] RELATIVE VALUE ITERATION FOR AVERAGE REWARD SEMI-MARKOV CONTROL VIA SIMULATION
    Gosavi, Abhijit
    [J]. 2013 WINTER SIMULATION CONFERENCE (WSC), 2013, : 623 - 630
  • [5] The optimal robust control policy for uncertain semi-Markov control processes
    Tang, H
    Xi, HS
    Yin, BQ
    [J]. INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE, 2005, 36 (13) : 791 - 800
  • [6] Softened Approximate Policy Iteration for Markov Games
    Perolat, Julien
    Piot, Bilal
    Geist, Matthieu
    Scherrer, Bruno
    Pietquin, Olivier
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [8] AN APPROXIMATE ALGORITHM FOR EVALUATION OF SEMI-MARKOV RELIABILITY MODELS
    SRICHANDER, R
    WALKER, BK
    [J]. PROCEEDINGS OF THE 1989 AMERICAN CONTROL CONFERENCE, VOLS 1-3, 1989, : 2653 - 2659
  • [9] APPROXIMATE EVALUATION OF SEMI-MARKOV CHAIN RELIABILITY MODELS
    WERELEY, NM
    WALKER, BK
    [J]. RELIABILITY ENGINEERING & SYSTEM SAFETY, 1990, 28 (02) : 133 - 164
  • [10] STOCHASTIC POLICY SEARCH FOR VARIANCE-PENALIZED SEMI-MARKOV CONTROL
    Gosavi, Abhijit
    Purohit, Mandar
    [J]. PROCEEDINGS OF THE 2011 WINTER SIMULATION CONFERENCE (WSC), 2011, : 2860 - 2871