A survey of recent results on continuous-time Markov decision processes

被引:56
|
作者
Guo, Xianping [1 ]
Hernandez-Lerma, Onesimo [1 ]
Prieto-Rumeau, Tomas [1 ]
机构
[1] Zhongshan Univ, Beijing, Peoples R China
关键词
continuous-time Markov decision processes (also known as controlled Markov chains); unbounded reward and transition rates; discounted reward; average reward; bias optimality; sensitive discount criteria;
D O I
10.1007/BF02837562
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 ; 12 ; 1201 ; 1202 ; 120202 ;
摘要
This paper is a survey of recent results on continuous-time Markov decision processes (MDPs) with unbounded transition rates, and reward rates that may be unbounded from above and from below. These results pertain to discounted and average reward optimality criteria, which are the most commonly used criteria, and also to more selective concepts, such as bias optimality and sensitive discount criteria. For concreteness, we consider only MDPs with a countable state space, but we indicate how the results can be extended to more general MDPs or to Markov games.
引用
收藏
页码:177 / 243
页数:67
相关论文
共 50 条