Computing semi-stationary optimal policies for multichain semi-Markov decision processes

被引:1
|
作者
Mondal, Prasenjit [1 ]
机构
[1] Govt Gen Degree Coll, Dept Math, Ranibandh 722135, Bankura, India
关键词
Semi-Markov decision processes; Limiting ratio average reward; Multichain structure; Pure optimal semi-stationary policies; Linear programming;
D O I
10.1007/s10479-017-2686-x
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 ; 12 ; 1201 ; 1202 ; 120202 ;
摘要
We consider semi-Markov decision processes with finite state and action spaces and a general multichain structure. A form of limiting ratio average (undiscounted) reward is the criterion for comparing different policies. The main result is that the value vector and a pure optimal semi-stationary policy (i.e., a policy which depends only on the initial state and the current state) for such an SMDP can be computed directly from an optimal solution of a finite set (whose cardinality equals the number of states) of linear programming (LP) problems. To be more precise, we prove that the single LP associated with a fixed initial state provides the value and an optimal pure stationary policy of the corresponding SMDP. The relation between the set of feasible solutions of each LP and the set of stationary policies is also analyzed. Examples are worked out to describe the algorithm.
引用
收藏
页码:843 / 865
页数:23
相关论文
共 50 条
  • [41] Computing optimal stationary policies for multi-objective Markov decision processes
    Wiering, Marco A.
    de Jong, Edwin D.
    [J]. 2007 IEEE INTERNATIONAL SYMPOSIUM ON APPROXIMATE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING, 2007, : 158 - +
  • [42] Optimal stopping time on discounted semi-Markov processes
    Chen, Fang
    Guo, Xianping
    Liao, Zhong-Wei
    [J]. FRONTIERS OF MATHEMATICS IN CHINA, 2021, 16 (02) : 303 - 324
  • [43] Optimal stopping time on discounted semi-Markov processes
    Fang Chen
    Xianping Guo
    Zhong-Wei Liao
    [J]. Frontiers of Mathematics in China, 2021, 16 : 303 - 324
  • [44] Optimality of Quasi-Open-Loop Policies for Discounted Semi-Markov Decision Processes
    Adelman, Daniel
    Mancini, Angelo J.
    [J]. MATHEMATICS OF OPERATIONS RESEARCH, 2016, 41 (04) : 1222 - 1247
  • [45] Semi-Markov processes for coverage modeling and optimal maintenance policies of an automated restoration mechanism
    Grigoriadou, H. C.
    Koutras, V. P.
    Platis, A. N.
    [J]. ADVANCES IN SAFETY, RELIABILITY AND RISK MANAGEMENT, 2012, : 949 - 956
  • [46] ADDITIONAL QUASI-STATIONARY DISTRIBUTIONS FOR SEMI-MARKOV PROCESSES
    FLASPOHLER, DC
    HOLMES, PT
    [J]. JOURNAL OF APPLIED PROBABILITY, 1972, 9 (03) : 671 - +
  • [47] COMPARISON OF SEMI-MARKOV AND MARKOV PROCESSES
    KURTZ, TG
    [J]. ANNALS OF MATHEMATICAL STATISTICS, 1971, 42 (03): : 991 - &
  • [48] Asymptotic Expansions for Stationary Distributions of Perturbed Semi-Markov Processes
    Silvestrov, Dmitrii
    Silvestrov, Sergei
    [J]. 2016 SECOND INTERNATIONAL SYMPOSIUM ON STOCHASTIC MODELS IN RELIABILITY ENGINEERING, LIFE SCIENCE AND OPERATIONS MANAGEMENT (SMRLO), 2016, : 41 - 46
  • [49] ON REVERSIBLE SEMI-MARKOV PROCESSES
    CHARI, MK
    [J]. OPERATIONS RESEARCH LETTERS, 1994, 15 (03) : 157 - 161
  • [50] IMBEDDED SEMI-MARKOV PROCESSES
    BRODI, SM
    [J]. TEORIYA VEROYATNOSTEI I YEYE PRIMENIYA, 1975, 20 (02): : 450 - 452