On ordinal comparison of policies in Markov reward processes

被引：2

作者：

Chang, HS ^{[1
]}

机构：

[1] Sogang Univ, Dept Comp Sci & Engn, Seoul, South Korea

来源：

JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS | 2004年 / 122卷 / 01期

关键词：

ordinal comparisons; large deviations; stochastic simulations; Markov reward processes;

D O I：

10.1023/B:JOTA.0000041736.82051.f1

中图分类号：

C93 [管理学]; O22 [运筹学];

学科分类号：

070105 ; 12 ; 1201 ; 1202 ; 120202 ;

摘要：

An asymptotic exponential convergence rate of ordinal comparison from large deviations theory is well known for selecting the true best solution from the candidate solutions sample means. This note supplements the theories developed by Dai within the framework of ergodic Markov reward processes for epsilon-ordinal comparison of policies, establishing an asymptotic exponential convergence rate for the infinite-horizon average criterion.

引用

页码：207 / 217

页数：11

共 50 条

[1] Technical Note: On Ordinal Comparison of Policies in Markov Reward Processes
H. S. Chang
Journal of Optimization Theory and Applications, 2004, 122 : 207 - 217
[2] Incremental Improvements of Heuristic Policies for Average-Reward Markov Decision Processes
Reveliotis, S.
Ibrahim, M.
IFAC PAPERSONLINE, 2020, 53 (02): : 1721 - 1728
[3] Splitting Randomized Stationary Policies in Total-Reward Markov Decision Processes
Feinberg, Eugene A.
Rothblum, Uriel G.
MATHEMATICS OF OPERATIONS RESEARCH, 2012, 37 (01) : 129 - 153
[4] Markov Decision Processes with Arbitrary Reward Processes
Yu, Jia Yuan
Mannor, Shie
Shimkin, Nahum
RECENT ADVANCES IN REINFORCEMENT LEARNING, 2008, 5323 : 268 - +
[5] Markov Decision Processes with Arbitrary Reward Processes
Yu, Jia Yuan
Mannor, Shie
Shimkin, Nahum
MATHEMATICS OF OPERATIONS RESEARCH, 2009, 34 (03) : 737 - 757
[6] Ordinal Decision Models for Markov Decision Processes
Weng, Paul
20TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (ECAI 2012), 2012, 242 : 828 - 833
[7] A preorder relation for Markov reward processes
Daly, David
Buchholz, Peter
Sanders, William H.
STATISTICS & PROBABILITY LETTERS, 2007, 77 (11) : 1148 - 1157
[8] Adaptive optimization of Markov reward processes
Campos-Nanez, Enrique
Patek, Stephen D.
2005 44TH IEEE CONFERENCE ON DECISION AND CONTROL & EUROPEAN CONTROL CONFERENCE, VOLS 1-8, 2005, : 8034 - 8041
[9] Distributed optimization of Markov reward processes
Campos-Nane, Enrique
PROCEEDINGS OF THE 46TH IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-14, 2007, : 3921 - 3926
[10] EXISTENCE OF OPTIMAL STATIONARY POLICIES IN AVERAGE REWARD MARKOV DECISION-PROCESSES WITH A RECURRENT STATE
CAVAZOSCADENA, R
APPLIED MATHEMATICS AND OPTIMIZATION, 1992, 26 (02): : 171 - 194

← 1 2 3 4 5 →