Potential-based online policy iteration algorithms for Markov decision processes

被引:26
|
作者
Fang, HT [1 ]
Cao, XR
机构
[1] Chinese Acad Sci, Acad Math & Syst Sci, Lab Syst & Control, Beijing 100080, Peoples R China
[2] Hong Kong Univ Sci & Technol, Kowloon, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
Markov decision process; potential; recursive optimization;
D O I
10.1109/TAC.2004.825647
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Performance potentials play a crucial role in performance sensitivity analysis and policy iteration of Markov decision processes. The potentials can be estimated on a single sample path of a Markov process. In this paper, we propose two potential-based online policy iteration algorithms for performance optimization of Markov systems. The algorithms are based on online estimation of potentials and stochastic approximation. We prove that with these two algorithms the optimal. policy can be attained after it finite number of iterations. A simulation example,is given to illustrate the main ideas and the convergence rates of the algorithms.
引用
收藏
页码:493 / 505
页数:13
相关论文
共 50 条