A policy improvement method in constrained stochastic dynamic programming

被引:9
|
作者
Chang, Hyeong Soo [1 ]
机构
[1] Sogang Univ, Dept Comp Sci & Engn, Seoul 121742, South Korea
[2] Sogang Univ, Program Integrated Biotechnol, Seoul 121742, South Korea
关键词
constrained Markov decision process; dynamic programming; policy improvement; policy iteration;
D O I
10.1109/TAC.2006.880801
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This note presents a formal method of improving a given base-policy such that the performance of the resulting policy is no worse than that of the base-policy at all states in constrained stochastic dynamic programming. We consider finite horizon and discounted infinite horizon cases. The improvement method induces a policy iteration-type algorithm that converges to a local optimal policy.
引用
下载
收藏
页码:1523 / 1526
页数:4
相关论文
共 50 条