On ordinal comparison of policies in Markov reward processes

被引:2
|
作者
Chang, HS [1 ]
机构
[1] Sogang Univ, Dept Comp Sci & Engn, Seoul, South Korea
关键词
ordinal comparisons; large deviations; stochastic simulations; Markov reward processes;
D O I
10.1023/B:JOTA.0000041736.82051.f1
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 ; 12 ; 1201 ; 1202 ; 120202 ;
摘要
An asymptotic exponential convergence rate of ordinal comparison from large deviations theory is well known for selecting the true best solution from the candidate solutions sample means. This note supplements the theories developed by Dai within the framework of ergodic Markov reward processes for epsilon-ordinal comparison of policies, establishing an asymptotic exponential convergence rate for the infinite-horizon average criterion.
引用
收藏
页码:207 / 217
页数:11
相关论文
共 50 条
  • [1] Technical Note: On Ordinal Comparison of Policies in Markov Reward Processes
    H. S. Chang
    Journal of Optimization Theory and Applications, 2004, 122 : 207 - 217
  • [2] Incremental Improvements of Heuristic Policies for Average-Reward Markov Decision Processes
    Reveliotis, S.
    Ibrahim, M.
    IFAC PAPERSONLINE, 2020, 53 (02): : 1721 - 1728
  • [3] Splitting Randomized Stationary Policies in Total-Reward Markov Decision Processes
    Feinberg, Eugene A.
    Rothblum, Uriel G.
    MATHEMATICS OF OPERATIONS RESEARCH, 2012, 37 (01) : 129 - 153
  • [4] Markov Decision Processes with Arbitrary Reward Processes
    Yu, Jia Yuan
    Mannor, Shie
    Shimkin, Nahum
    RECENT ADVANCES IN REINFORCEMENT LEARNING, 2008, 5323 : 268 - +
  • [5] Markov Decision Processes with Arbitrary Reward Processes
    Yu, Jia Yuan
    Mannor, Shie
    Shimkin, Nahum
    MATHEMATICS OF OPERATIONS RESEARCH, 2009, 34 (03) : 737 - 757
  • [6] Ordinal Decision Models for Markov Decision Processes
    Weng, Paul
    20TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (ECAI 2012), 2012, 242 : 828 - 833
  • [7] A preorder relation for Markov reward processes
    Daly, David
    Buchholz, Peter
    Sanders, William H.
    STATISTICS & PROBABILITY LETTERS, 2007, 77 (11) : 1148 - 1157
  • [8] Adaptive optimization of Markov reward processes
    Campos-Nanez, Enrique
    Patek, Stephen D.
    2005 44TH IEEE CONFERENCE ON DECISION AND CONTROL & EUROPEAN CONTROL CONFERENCE, VOLS 1-8, 2005, : 8034 - 8041
  • [9] Distributed optimization of Markov reward processes
    Campos-Nane, Enrique
    PROCEEDINGS OF THE 46TH IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-14, 2007, : 3921 - 3926
  • [10] EXISTENCE OF OPTIMAL STATIONARY POLICIES IN AVERAGE REWARD MARKOV DECISION-PROCESSES WITH A RECURRENT STATE
    CAVAZOSCADENA, R
    APPLIED MATHEMATICS AND OPTIMIZATION, 1992, 26 (02): : 171 - 194