AVERAGE COST OPTIMALITY INEQUALITY FOR MARKOV DECISION PROCESSES WITH BOREL SPACES AND UNIVERSALLY MEASURABLE POLICIES

被引：4

作者：

Yu, Huizhen ^{[1
]}

机构：

[1] Univ Alberta, Dept Comp Sci, Edmonton, AB T6G 2N8, Canada

来源：

SIAM JOURNAL ON CONTROL AND OPTIMIZATION | 2020年 / 58卷 / 04期

关键词：

Markov decision processes; Borel spaces; universally measurable policies; average cost; optimality inequality; majorization conditions; OPTIMAL REWARD OPERATOR; EQUATION; ITERATION; CONVERGENCE; THEOREMS; CHAINS;

D O I：

10.1137/19M1239507

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We consider average-cost Markov decision processes (MDPs) with Borel state and action spaces and universally measurable policies. For the nonnegative cost model and an unbounded cost model with a Lyapunov-type stability character, we introduce a set of new conditions under which we prove the average cost optimality inequality (ACOI) via the vanishing discount factor approach. Unlike most existing results on the ACOI, our result does not require any compactness and continuity conditions on the MDPs. Instead, the main idea is to use the almost-uniform-convergence property of a pointwise convergent sequence of measurable functions as asserted in Egoroff's theorem. Our conditions are formulated in order to exploit this property. Among others, we require that for each state, on selected subsets of actions at that state, the state transition stochastic kernel is majorized by finite measures. We combine this majorization property of the transition kernel with Egoroff's theorem to prove the ACOI.

引用

页码：2469 / 2502

页数：34

共 50 条

[1] On structural properties of optimal average cost functions in Markov decision processes with Borel spaces and universally measurable policies
Yu, Huizhen
[J]. JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 2022, 509 (01)
[2] THE AVERAGE COST OPTIMALITY EQUATION FOR MARKOV CONTROL PROCESSES ON BOREL SPACES
MONTESDEOCA, R
[J]. SYSTEMS & CONTROL LETTERS, 1994, 22 (05) : 351 - 357
[3] MARKOV DECISION-PROCESSES WITH A BOREL MEASURABLE COST FUNCTION - THE AVERAGE CASE
KURANO, M
[J]. MATHEMATICS OF OPERATIONS RESEARCH, 1986, 11 (02) : 309 - 320
[4] Constrained Markov decision processes in Borel spaces: from discounted to average optimality
Mendoza-Perez, Armando F.
Jasso-Fuentes, Hector
De-la-Cruz Courtois, Omar A.
[J]. MATHEMATICAL METHODS OF OPERATIONS RESEARCH, 2016, 84 (03) : 489 - 525
[5] Average optimality for Markov decision processes in Borel spaces: A new condition and approach
Guo, Xianping
Zhu, Quanxin
[J]. JOURNAL OF APPLIED PROBABILITY, 2006, 43 (02) : 318 - 334
[6] Constrained Markov decision processes in Borel spaces: from discounted to average optimality
Armando F. Mendoza-Pérez
Héctor Jasso-Fuentes
Omar A. De-la-Cruz Courtois
[J]. Mathematical Methods of Operations Research, 2016, 84 : 489 - 525
[7] Value Iteration for Average Cost Markov Decision Processes in Borel Spaces
Zhu, Quanxin
Guo, Xianping
[J]. APPLIED MATHEMATICS RESEARCH EXPRESS, 2005, (02) : 61 - 76
[8] New Average Optimality Conditions for Semi-Markov Decision Processes in Borel Spaces
Qingda Wei
Xianping Guo
[J]. Journal of Optimization Theory and Applications, 2012, 153 : 709 - 732
[9] New Average Optimality Conditions for Semi-Markov Decision Processes in Borel Spaces
Wei, Qingda
Guo, Xianping
[J]. JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 2012, 153 (03) : 709 - 732
[10] Average optimality inequality for continuous-time Markov decision processes in Polish spaces
Quanxin Zhu
[J]. Mathematical Methods of Operations Research, 2007, 66 : 299 - 313

← 1 2 3 4 5 →