Semi-Markov decision problems and performance sensitivity analysis

被引:53
|
作者
Cao, XR [1 ]
机构
[1] Hong Kong Univ Sci & Technol, Ctr Networking, Kowloon, Hong Kong, Peoples R China
关键词
discounted Poisson equations; discrete-event dynamic systems (DEDS); Lyapunov equations; Markov decision processes (MDPs); perturbation analysis (PA); perturbation realization; Poisson equations; policy iteration; potentials; reinforcement learning (RL);
D O I
10.1109/TAC.2003.811252
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recent research indicates that Markov decision processes (MDPs) can be viewed from a sensitivity point of view; and perturbation analysis (PA), MDPs, and reinforcement learning (RL) are three closely related areas in optimization of discrete-event dynamic systems that can be modeled as Markov processes. The goal of this paper is two-fold. First, we develop PA theory for semi-Markov processes (SMPs); and second, we extend the aforementioned results about the relation among PA, MDP, and RL to SMPs. In particular, we show that performance sensitivity formulas and policy iteration algorithms of semi-Markov decision processes (SMDPs) can be derived based on performance, potential and realization matrix. Both the long-run average and discounted-cost problems are considered; this approach provides a unified framework for both problems, and the long-run average problem corresponds to the discounted. factor. being zero. The results indicate that performance sensitivities and optimization depend only on first-order statistics. Single sample path-based implementations are discussed.
引用
收藏
页码:758 / 769
页数:12
相关论文
共 50 条
  • [11] Mean-Variance Problems for Finite Horizon Semi-Markov Decision Processes
    Yonghui Huang
    Xianping Guo
    Applied Mathematics & Optimization, 2015, 72 : 233 - 259
  • [12] Policy Gradient Based Semi-Markov Decision Problems: Approximation and Estimation Errors
    Vien, Ngo Anh
    Lee, SeungGwan
    Chung, TaeChoong
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (02): : 271 - 279
  • [13] Solving semi-Markov decision problems using average reward reinforcement learning
    Das, TK
    Gosavi, A
    Mahadevan, S
    Marchalleck, N
    MANAGEMENT SCIENCE, 1999, 45 (04) : 560 - 574
  • [14] Solving semi-Markov decision problems using average reward reinforcement learning
    Dept. Indust. and Mgmt. Syst. Eng., University of South Florida, Tampa, FL 33620, United States
    不详
    不详
    Manage Sci, 4 (560-574):
  • [15] GENERALIZED SEMI-MARKOV DECISION-PROCESSES
    DOSHI, BT
    JOURNAL OF APPLIED PROBABILITY, 1979, 16 (03) : 618 - 630
  • [16] SEMI-MARKOV DECISION PROCESSES WITH UNBOUNDED REWARDS
    LIPPMAN, SA
    MANAGEMENT SCIENCE SERIES A-THEORY, 1973, 19 (07): : 717 - 731
  • [17] Mean-Variance Problems for Finite Horizon Semi-Markov Decision Processes
    Huang, Yonghui
    Guo, Xianping
    APPLIED MATHEMATICS AND OPTIMIZATION, 2015, 72 (02): : 233 - 259
  • [18] Policy Gradient Semi-Markov Decision Process
    Vien, Ngo Anh
    Chung, TaeChoong
    20TH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, VOL 2, PROCEEDINGS, 2008, : 11 - 18
  • [19] AVERAGE COST SEMI-MARKOV DECISION PROCESSES
    ROSS, SM
    JOURNAL OF APPLIED PROBABILITY, 1970, 7 (03) : 649 - &
  • [20] Constrained discounted semi-Markov decision processes
    Feinberg, EA
    MARKOV PROCESSES AND CONTROLLED MARKOV CHAINS, 2002, : 233 - 244