Semi-Markov decision problems and performance sensitivity analysis

被引：53

作者：

Cao, XR ^{[1
]}

机构：

[1] Hong Kong Univ Sci & Technol, Ctr Networking, Kowloon, Hong Kong, Peoples R China

来源：

IEEE TRANSACTIONS ON AUTOMATIC CONTROL | 2003年 / 48卷 / 05期

关键词：

discounted Poisson equations; discrete-event dynamic systems (DEDS); Lyapunov equations; Markov decision processes (MDPs); perturbation analysis (PA); perturbation realization; Poisson equations; policy iteration; potentials; reinforcement learning (RL);

D O I：

10.1109/TAC.2003.811252

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recent research indicates that Markov decision processes (MDPs) can be viewed from a sensitivity point of view; and perturbation analysis (PA), MDPs, and reinforcement learning (RL) are three closely related areas in optimization of discrete-event dynamic systems that can be modeled as Markov processes. The goal of this paper is two-fold. First, we develop PA theory for semi-Markov processes (SMPs); and second, we extend the aforementioned results about the relation among PA, MDP, and RL to SMPs. In particular, we show that performance sensitivity formulas and policy iteration algorithms of semi-Markov decision processes (SMDPs) can be derived based on performance, potential and realization matrix. Both the long-run average and discounted-cost problems are considered; this approach provides a unified framework for both problems, and the long-run average problem corresponds to the discounted. factor. being zero. The results indicate that performance sensitivities and optimization depend only on first-order statistics. Single sample path-based implementations are discussed.

引用

页码：758 / 769

页数：12

共 50 条

[11] Mean-Variance Problems for Finite Horizon Semi-Markov Decision Processes
Yonghui Huang
Xianping Guo
Applied Mathematics & Optimization, 2015, 72 : 233 - 259
[12] Policy Gradient Based Semi-Markov Decision Problems: Approximation and Estimation Errors
Vien, Ngo Anh
Lee, SeungGwan
Chung, TaeChoong
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (02): : 271 - 279
[13] Solving semi-Markov decision problems using average reward reinforcement learning
Das, TK
Gosavi, A
Mahadevan, S
Marchalleck, N
MANAGEMENT SCIENCE, 1999, 45 (04) : 560 - 574
[14] Solving semi-Markov decision problems using average reward reinforcement learning
Dept. Indust. and Mgmt. Syst. Eng., University of South Florida, Tampa, FL 33620, United States
不详
不详
Manage Sci, 4 (560-574):
[15] GENERALIZED SEMI-MARKOV DECISION-PROCESSES
DOSHI, BT
JOURNAL OF APPLIED PROBABILITY, 1979, 16 (03) : 618 - 630
[16] SEMI-MARKOV DECISION PROCESSES WITH UNBOUNDED REWARDS
LIPPMAN, SA
MANAGEMENT SCIENCE SERIES A-THEORY, 1973, 19 (07): : 717 - 731
[17] Mean-Variance Problems for Finite Horizon Semi-Markov Decision Processes
Huang, Yonghui
Guo, Xianping
APPLIED MATHEMATICS AND OPTIMIZATION, 2015, 72 (02): : 233 - 259
[18] Policy Gradient Semi-Markov Decision Process
Vien, Ngo Anh
Chung, TaeChoong
20TH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, VOL 2, PROCEEDINGS, 2008, : 11 - 18
[19] AVERAGE COST SEMI-MARKOV DECISION PROCESSES
ROSS, SM
JOURNAL OF APPLIED PROBABILITY, 1970, 7 (03) : 649 - &
[20] Constrained discounted semi-Markov decision processes
Feinberg, EA
MARKOV PROCESSES AND CONTROLLED MARKOV CHAINS, 2002, : 233 - 244

← 1 2 3 4 5 →