Computing semi-stationary optimal policies for multichain semi-Markov decision processes

被引：1

作者：

Mondal, Prasenjit ^{[1
]}

机构：

[1] Govt Gen Degree Coll, Dept Math, Ranibandh 722135, Bankura, India

来源：

ANNALS OF OPERATIONS RESEARCH | 2020年 / 287卷 / 02期

关键词：

Semi-Markov decision processes; Limiting ratio average reward; Multichain structure; Pure optimal semi-stationary policies; Linear programming;

D O I：

10.1007/s10479-017-2686-x

中图分类号：

C93 [管理学]; O22 [运筹学];

学科分类号：

070105 ; 12 ; 1201 ; 1202 ; 120202 ;

摘要：

We consider semi-Markov decision processes with finite state and action spaces and a general multichain structure. A form of limiting ratio average (undiscounted) reward is the criterion for comparing different policies. The main result is that the value vector and a pure optimal semi-stationary policy (i.e., a policy which depends only on the initial state and the current state) for such an SMDP can be computed directly from an optimal solution of a finite set (whose cardinality equals the number of states) of linear programming (LP) problems. To be more precise, we prove that the single LP associated with a fixed initial state provides the value and an optimal pure stationary policy of the corresponding SMDP. The relation between the set of feasible solutions of each LP and the set of stationary policies is also analyzed. Examples are worked out to describe the algorithm.

引用

页码：843 / 865

页数：23

共 50 条

[41] Computing optimal stationary policies for multi-objective Markov decision processes
Wiering, Marco A.
de Jong, Edwin D.
[J]. 2007 IEEE INTERNATIONAL SYMPOSIUM ON APPROXIMATE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING, 2007, : 158 - +
[42] Optimal stopping time on discounted semi-Markov processes
Chen, Fang
Guo, Xianping
Liao, Zhong-Wei
[J]. FRONTIERS OF MATHEMATICS IN CHINA, 2021, 16 (02) : 303 - 324
[43] Optimal stopping time on discounted semi-Markov processes
Fang Chen
Xianping Guo
Zhong-Wei Liao
[J]. Frontiers of Mathematics in China, 2021, 16 : 303 - 324
[44] Optimality of Quasi-Open-Loop Policies for Discounted Semi-Markov Decision Processes
Adelman, Daniel
Mancini, Angelo J.
[J]. MATHEMATICS OF OPERATIONS RESEARCH, 2016, 41 (04) : 1222 - 1247
[45] Semi-Markov processes for coverage modeling and optimal maintenance policies of an automated restoration mechanism
Grigoriadou, H. C.
Koutras, V. P.
Platis, A. N.
[J]. ADVANCES IN SAFETY, RELIABILITY AND RISK MANAGEMENT, 2012, : 949 - 956
[46] ADDITIONAL QUASI-STATIONARY DISTRIBUTIONS FOR SEMI-MARKOV PROCESSES
FLASPOHLER, DC
HOLMES, PT
[J]. JOURNAL OF APPLIED PROBABILITY, 1972, 9 (03) : 671 - +
[47] COMPARISON OF SEMI-MARKOV AND MARKOV PROCESSES
KURTZ, TG
[J]. ANNALS OF MATHEMATICAL STATISTICS, 1971, 42 (03): : 991 - &
[48] Asymptotic Expansions for Stationary Distributions of Perturbed Semi-Markov Processes
Silvestrov, Dmitrii
Silvestrov, Sergei
[J]. 2016 SECOND INTERNATIONAL SYMPOSIUM ON STOCHASTIC MODELS IN RELIABILITY ENGINEERING, LIFE SCIENCE AND OPERATIONS MANAGEMENT (SMRLO), 2016, : 41 - 46
[49] ON REVERSIBLE SEMI-MARKOV PROCESSES
CHARI, MK
[J]. OPERATIONS RESEARCH LETTERS, 1994, 15 (03) : 157 - 161
[50] IMBEDDED SEMI-MARKOV PROCESSES
BRODI, SM
[J]. TEORIYA VEROYATNOSTEI I YEYE PRIMENIYA, 1975, 20 (02): : 450 - 452

← 1 2 3 4 5 →