Semi-Markov decision problems and performance sensitivity analysis

被引：53

作者：

Cao, XR ^{[1
]}

机构：

[1] Hong Kong Univ Sci & Technol, Ctr Networking, Kowloon, Hong Kong, Peoples R China

来源：

IEEE TRANSACTIONS ON AUTOMATIC CONTROL | 2003年 / 48卷 / 05期

关键词：

discounted Poisson equations; discrete-event dynamic systems (DEDS); Lyapunov equations; Markov decision processes (MDPs); perturbation analysis (PA); perturbation realization; Poisson equations; policy iteration; potentials; reinforcement learning (RL);

D O I：

10.1109/TAC.2003.811252

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recent research indicates that Markov decision processes (MDPs) can be viewed from a sensitivity point of view; and perturbation analysis (PA), MDPs, and reinforcement learning (RL) are three closely related areas in optimization of discrete-event dynamic systems that can be modeled as Markov processes. The goal of this paper is two-fold. First, we develop PA theory for semi-Markov processes (SMPs); and second, we extend the aforementioned results about the relation among PA, MDP, and RL to SMPs. In particular, we show that performance sensitivity formulas and policy iteration algorithms of semi-Markov decision processes (SMDPs) can be derived based on performance, potential and realization matrix. Both the long-run average and discounted-cost problems are considered; this approach provides a unified framework for both problems, and the long-run average problem corresponds to the discounted. factor. being zero. The results indicate that performance sensitivities and optimization depend only on first-order statistics. Single sample path-based implementations are discussed.

引用

页码：758 / 769

页数：12

共 50 条

[31] CONVERGENCE OF SEMI-MARKOV WALKS TO A CONTINUOUS SEMI-MARKOV PROCESS
KHARLAMOV, BP
THEORY OF PROBABILITY AND ITS APPLICATIONS, 1976, 21 (03) : 482 - 498
[32] Leader-follower semi-Markov decision problems: Theoretical framework and approximate solution
Tharakunnel, Kurian
Bhattacharyya, Siddhartha
2007 IEEE INTERNATIONAL SYMPOSIUM ON APPROXIMATE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING, 2007, : 111 - +
[33] CONVERGENCE OF SEMI-MARKOV WANDERINGS TO SEMI-MARKOV CONTINUOUS PROCESS
KHARLAMOV, BP
TEORIYA VEROYATNOSTEI I YEYE PRIMENIYA, 1975, 20 (03): : 679 - 680
[34] Using Semi-Markov Chains to Solve Semi-Markov Processes
Bei Wu
Brenda Ivette Garcia Maya
Nikolaos Limnios
Methodology and Computing in Applied Probability, 2021, 23 : 1419 - 1431
[35] SYSTEM ANALYSIS OF SEMI-MARKOV PROCESSES
HOWARD, RA
IEEE TRANSACTIONS ON MILITARY ELECTRONICS, 1964, MIL8 (02): : 114 - &
[36] SEMI-MARKOV ANALYSIS OF A BULK QUEUE
NEUTS, MF
BULLETIN OF THE INTERNATIONAL STATISTICAL INSTITUTE, 1965, 41 (02): : 827 - 827
[37] OPTIMIZATION OF DENUMERABLE SEMI-MARKOV DECISION PROCESSES.
Staniewski, Piotr
Weinfeld, Roman
Systems Science, 1980, 6 (02): : 129 - 141
[38] Semi-Markov decision processes with variance minimization criterion
Wei, Qingda
Guo, Xianping
4OR-A QUARTERLY JOURNAL OF OPERATIONS RESEARCH, 2015, 13 (01): : 59 - 79
[39] On the second optimality equation for semi-Markov decision models
Schal, Manfred
Mathematics of Operations Research, 1992, 17 (02)
[40] Semi-Markov Based Maintenance Decision for Production System
Wu, Jianlong
Xiao, Boping
Yang, Liying
Zhao, Zhonghao
2018 3RD INTERNATIONAL CONFERENCE ON SYSTEM RELIABILITY AND SAFETY (ICSRS), 2018, : 340 - 345

← 1 2 3 4 5 →