Potential-based online policy iteration algorithms for Markov decision processes

被引:26
|
作者
Fang, HT [1 ]
Cao, XR
机构
[1] Chinese Acad Sci, Acad Math & Syst Sci, Lab Syst & Control, Beijing 100080, Peoples R China
[2] Hong Kong Univ Sci & Technol, Kowloon, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
Markov decision process; potential; recursive optimization;
D O I
10.1109/TAC.2004.825647
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Performance potentials play a crucial role in performance sensitivity analysis and policy iteration of Markov decision processes. The potentials can be estimated on a single sample path of a Markov process. In this paper, we propose two potential-based online policy iteration algorithms for performance optimization of Markov systems. The algorithms are based on online estimation of potentials and stochastic approximation. We prove that with these two algorithms the optimal. policy can be attained after it finite number of iterations. A simulation example,is given to illustrate the main ideas and the convergence rates of the algorithms.
引用
收藏
页码:493 / 505
页数:13
相关论文
共 50 条
  • [21] Approximate policy iteration with a policy language bias: Solving relational Markov decision processes
    Fern, A
    Yoon, S
    Givan, R
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2006, 25 : 75 - 118
  • [22] Mean Field Approximation of the Policy Iteration Algorithm for Graph-based Markov Decision Processes
    Peyrard, Nathalie
    Sabbadin, Regis
    ECAI 2006, PROCEEDINGS, 2006, 141 : 595 - +
  • [23] Variance reduced value iteration and faster algorithms for solving Markov decision processes
    Sidford, Aaron
    Wang, Mengdi
    Wu, Xian
    Ye, Yinyu
    NAVAL RESEARCH LOGISTICS, 2023, 70 (05) : 423 - 442
  • [24] Variance Reduced Value Iteration and Faster Algorithms for Solving Markov Decision Processes
    Sidford, Aaron
    Wang, Mengdi
    Wu, Xian
    Ye, Yinyu
    SODA'18: PROCEEDINGS OF THE TWENTY-NINTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2018, : 770 - 787
  • [25] Approximate Policy Iteration for Markov Decision Processes via Quantitative Adaptive Aggregations
    Abate, Alessandro
    Ceska, Milan
    Kwiatkowska, Marta
    AUTOMATED TECHNOLOGY FOR VERIFICATION AND ANALYSIS, ATVA 2016, 2016, 9938 : 13 - 31
  • [26] Partial policy iteration for L1-Robust Markov decision processes
    Ho, Chin Pang
    Petrik, Marek
    Wiesemann, Wolfram
    Journal of Machine Learning Research, 2021, 22
  • [27] Cosine Policy Iteration for Solving Infinite-Horizon Markov Decision Processes
    Frausto-Solis, Juan
    Santiago, Elizabeth
    Mora-Vargas, Jaime
    MICAI 2009: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2009, 5845 : 75 - +
  • [28] A note on the convergence of policy iteration in Markov decision processes with compact action spaces
    Golubin, AY
    MATHEMATICS OF OPERATIONS RESEARCH, 2003, 28 (01) : 194 - 200
  • [29] Inexact GMRES Policy Iteration for Large-Scale Markov Decision Processes
    Gargiani, Matilde
    Liao-McPherson, Dominic
    Zanelli, Andrea
    Lygeros, John
    IFAC PAPERSONLINE, 2023, 56 (02): : 11249 - 11254
  • [30] Robust topological policy iteration for infinite horizon bounded Markov Decision Processes
    Silva Reis, Willy Arthur
    de Barros, Leliane Nunes
    Delgado, Karina Valdivia
    INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2019, 105 : 287 - 304