ASYNCHRONOUS VALUE ITERATION FOR MARKOV DECISION PROCESSES WITH CONTINUOUS STATE SPACES

被引:2
|
作者
Yang, Xiangyu [1 ]
Hu, Jian-Qiang [1 ]
Hu, Jiaqiao [2 ]
Peng, Yijie [3 ]
机构
[1] Fudan Univ, Sch Management, Shanghai, Peoples R China
[2] SUNY Stony Brook, Dept Appl Math & Stat, Stony Brook, NY 11794 USA
[3] Peking Univ, Guanghua Sch Management, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
SIMULATION;
D O I
10.1109/WSC48552.2020.9384120
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We propose a simulation-based value iteration algorithm for approximately solving infinite horizon discounted MDPs with continuous state spaces and finite actions. At each time step, the algorithm employs the shrinking ball method to estimate the value function at sampled states and uses historical estimates in an interpolation-based fitting strategy to build an approximator of the optimal value function. Under moderate conditions, we prove that the sequence of approximators generated by the algorithm converges uniformly to the optimal value function with probability one. Simple numerical examples are provided to compare our algorithm with two other existing methods.
引用
收藏
页码:2856 / 2866
页数:11
相关论文
共 50 条