ASYNCHRONOUS VALUE ITERATION FOR MARKOV DECISION PROCESSES WITH CONTINUOUS STATE SPACES

被引：2

作者：

Yang, Xiangyu ^{[1
]}

Hu, Jian-Qiang ^{[1
]}

Hu, Jiaqiao ^{[2
]}

Peng, Yijie ^{[3
]}

机构：

[1] Fudan Univ, Sch Management, Shanghai, Peoples R China

[2] SUNY Stony Brook, Dept Appl Math & Stat, Stony Brook, NY 11794 USA

[3] Peking Univ, Guanghua Sch Management, Beijing, Peoples R China

来源：

2020 WINTER SIMULATION CONFERENCE (WSC) | 2020年

基金：

中国国家自然科学基金;

关键词：

SIMULATION;

D O I：

10.1109/WSC48552.2020.9384120

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

We propose a simulation-based value iteration algorithm for approximately solving infinite horizon discounted MDPs with continuous state spaces and finite actions. At each time step, the algorithm employs the shrinking ball method to estimate the value function at sampled states and uses historical estimates in an interpolation-based fitting strategy to build an approximator of the optimal value function. Under moderate conditions, we prove that the sequence of approximators generated by the algorithm converges uniformly to the optimal value function with probability one. Simple numerical examples are provided to compare our algorithm with two other existing methods.

引用

页码：2856 / 2866

页数：11

共 50 条

[1] The value iteration method for countable state Markov decision processes
Aviv, Y
Federgruen, A
[J]. OPERATIONS RESEARCH LETTERS, 1999, 24 (05) : 223 - 234
[2] Value Iteration for Average Cost Markov Decision Processes in Borel Spaces
Zhu, Quanxin
Guo, Xianping
[J]. APPLIED MATHEMATICS RESEARCH EXPRESS, 2005, (02) : 61 - 76
[3] Value set iteration for Markov decision processes
Chang, Hyeong Soo
[J]. AUTOMATICA, 2014, 50 (07) : 1940 - 1943
[4] New prioritized value iteration for Markov decision processes
de Guadalupe Garcia-Hernandez, Ma.
Ruiz-Pinales, Jose
Onaindia, Eva
Gabriel Avina-Cervantes, J.
Ledesma-Orozco, Sergio
Alvarado-Mendez, Edgar
Reyes-Ballesteros, Alberto
[J]. ARTIFICIAL INTELLIGENCE REVIEW, 2012, 37 (02) : 157 - 167
[5] Topological Value Iteration Algorithm for Markov Decision Processes
Dai, Peng
Goldsmith, Judy
[J]. 20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 1860 - 1865
[6] New prioritized value iteration for Markov decision processes
Ma. de Guadalupe Garcia-Hernandez
Jose Ruiz-Pinales
Eva Onaindia
J. Gabriel Aviña-Cervantes
Sergio Ledesma-Orozco
Edgar Alvarado-Mendez
Alberto Reyes-Ballesteros
[J]. Artificial Intelligence Review, 2012, 37 : 157 - 167
[7] Policy Iteration for Continuous-Time Average Reward Markov Decision Processes in Polish Spaces
Zhu, Quanxin
Yang, Xinsong
Huang, Chuangxia
[J]. ABSTRACT AND APPLIED ANALYSIS, 2009,
[8] VALUE ITERATION IN COUNTABLE STATE AVERAGE COST MARKOV DECISION PROCESSES WITH UNBOUNDED COSTS
Sennott, Linn I.
[J]. ANNALS OF OPERATIONS RESEARCH, 1991, 28 (01) : 261 - 271
[9] MARKOV DECISION PROCESSES WITH FINITE STATE AND DECISION SPACES
RYKOV, VV
[J]. THEORY OF PROBILITY AND ITS APPLICATIONS,USSR, 1966, 11 (02): : 302 - &
[10] A Q-learning algorithm for Markov decision processes with continuous state spaces
Hu, Jiaqiao
Yang, Xiangyu
Hu, Jian-Qiang
Peng, Yijie
[J]. SYSTEMS & CONTROL LETTERS, 2024, 187

← 1 2 3 4 5 →