Performance optimization of semi-Markov decision processes with discounted-cost criteria

被引:3
|
作者
Yin, Baoqun [1 ]
Li, Yanjie [1 ]
Zhou, Yaping [1 ]
Xi, Hongsheng [1 ]
机构
[1] Univ Sci & Technol China, Dept Automat, Hefei 230026, Anhui, Peoples R China
关键词
semi-Markov decision processes; discounted Poisson equation; alpha-potential; discounted-cost criteria; policy iteration; value iteration;
D O I
10.3166/EJC.14.213-222
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We discuss the problems of discounted-cost performance optimization for a class of semi-Markov decision processes (SMDPs). We define a matrix which can be used as the infinitesimal generator of a Markov process. The discounted Poisson equation is proposed for an SMDP by using this matrix, from which the alpha-potential is defined. The optimally equation satisfied by the optimal stationary policy is given and the relation between discounted model and average model is discussed. Two iteration algorithms to find is an element of-optimal policies are proposed and the proofs of convergence of these two algorithms are given. A numerical example is provided to illustrate the application of the algorithms.
引用
收藏
页码:213 / 222
页数:10
相关论文
共 50 条