Nonstationary denumerable state Markov decision processes – with average variance criterion

被引：0

作者：

Xianping Guo

机构：

[1] Department of Mathematics,

[2] Zhongshan University,undefined

[3] Guangzhou,undefined

[4] 510275,undefined

[5] P. R. China (e-mail: stsdaiy@zsulink.zsu.edu.cn),undefined

来源：

Mathematical Methods of Operations Research | 1999年 / 49卷 / 1期

关键词：

Key words: Discrete; time Markov decision processes; average expected criteria; optimality equations; average variance criterion; optimal Markov policies;

D O I：

10.1007/PL00020908

中图分类号：

学科分类号：

摘要：

. In this paper, we consider the nonstationary Markov decision processes (MDP, for short) with average variance criterion on a countable state space, finite action spaces and bounded one-step rewards. From the optimality equations which are provided in this paper, we translate the average variance criterion into a new average expected cost criterion. Then we prove that there exists a Markov policy, which is optimal in an original average expected reward criterion, that minimizies the average variance in the class of optimal policies for the original average expected reward criterion.

引用

页码：87 / 96

页数：9

共 50 条

[31] Transient solutions for multidimensional denumerable state Markov processes
Institute of Applied Mathematics, Chinese Academy of Sciences, Beijing 100080, China
Queueing Syst., 1-4 (317-329):
[32] STRONG AVERAGE OPTIMALITY CRITERION FOR CONTINUOUS-TIME MARKOV DECISION PROCESSES
Wei, Qingda
Chen, Xian
KYBERNETIKA, 2014, 50 (06) : 950 - 977
[33] An average-value-at-risk criterion for Markov decision processes with unbounded costs
Liu, Qiuli
Ching, Wai-Ki
Zhang, Junyu
Wang, Hongchu
FRONTIERS OF MATHEMATICS IN CHINA, 2022, 17 (04) : 673 - 687
[34] Pseudometrics for state aggregation in average reward Markov decision processes
Ortner, Ronald
ALGORITHMIC LEARNING THEORY, PROCEEDINGS, 2007, 4754 : 373 - 387
[35] An average-value-at-risk criterion for Markov decision processes with unbounded costs
Qiuli Liu
Wai-Ki Ching
Junyu Zhang
Hongchu Wang
Frontiers of Mathematics in China, 2022, 17 : 673 - 687
[36] Nonstationary Policies and Average Optimality in Multichain Markov Decision Processes with a General Action Space
A. Y. Golubin
Journal of Mathematical Sciences, 2004, 123 (1) : 3733 - 3740
[37] DENUMERABLE CONSTRAINED MARKOV DECISION-PROCESSES AND FINITE APPROXIMATIONS
ALTMAN, E
MATHEMATICS OF OPERATIONS RESEARCH, 1994, 19 (01) : 169 - 191
[38] OPTIMIZATION OF DENUMERABLE SEMI-MARKOV DECISION PROCESSES.
Staniewski, Piotr
Weinfeld, Roman
Systems Science, 1980, 6 (02): : 129 - 141
[39] Denumerable controlled markov chains with average reward criterion. Sample path optimality
Cavazos-Cadena, Rolando
Fernandez-Gaucherand, Emmanuel
ZOR. Zeitschrift Fuer Operations Research, 41 (01):
[40] NONSTATIONARY CONTINUOUS-TIME MARKOV DECISION-PROCESSES IN A SEMI-MARKOV ENVIRONMENT WITH DISCOUNTED CRITERION
HU, QY
JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 1995, 194 (03) : 640 - 659

← 1 2 3 4 5 →