ASYNCHRONOUS VALUE ITERATION FOR MARKOV DECISION PROCESSES WITH CONTINUOUS STATE SPACES

被引：2

作者：

Yang, Xiangyu ^{[1
]}

Hu, Jian-Qiang ^{[1
]}

Hu, Jiaqiao ^{[2
]}

Peng, Yijie ^{[3
]}

机构：

[1] Fudan Univ, Sch Management, Shanghai, Peoples R China

[2] SUNY Stony Brook, Dept Appl Math & Stat, Stony Brook, NY 11794 USA

[3] Peking Univ, Guanghua Sch Management, Beijing, Peoples R China

来源：

2020 WINTER SIMULATION CONFERENCE (WSC) | 2020年

基金：

中国国家自然科学基金;

关键词：

SIMULATION;

D O I：

10.1109/WSC48552.2020.9384120

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

We propose a simulation-based value iteration algorithm for approximately solving infinite horizon discounted MDPs with continuous state spaces and finite actions. At each time step, the algorithm employs the shrinking ball method to estimate the value function at sampled states and uses historical estimates in an interpolation-based fitting strategy to build an approximator of the optimal value function. Under moderate conditions, we prove that the sequence of approximators generated by the algorithm converges uniformly to the optimal value function with probability one. Simple numerical examples are provided to compare our algorithm with two other existing methods.

引用

页码：2856 / 2866

页数：11

共 50 条

[21] A PERTURBATION APPROACH TO APPROXIMATE VALUE ITERATION FOR AVERAGE COST MARKOV DECISION PROCESSES WITH BOREL SPACES AND BOUNDED COSTS
Vega-Amaya, Oscar
Lopez-Borbon, Joaqun
[J]. KYBERNETIKA, 2019, 55 (01) : 81 - 113
[22] A note on the convergence of policy iteration in Markov decision processes with compact action spaces
Golubin, AY
[J]. MATHEMATICS OF OPERATIONS RESEARCH, 2003, 28 (01) : 194 - 200
[23] Policy iteration type algorithms for recurrent state Markov decision processes
Patek, SD
[J]. COMPUTERS & OPERATIONS RESEARCH, 2004, 31 (14) : 2333 - 2347
[24] A pause control approach to the value iteration scheme in average Markov decision processes
Cavazos-Cadena, Rolando
[J]. Systems and Control Letters, 1998, 33 (04): : 209 - 219
[25] MONOTONE VALUE-ITERATION FOR DISCOUNTED FINITE MARKOV DECISION-PROCESSES
WHITE, DJ
[J]. JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 1985, 109 (02) : 311 - 324
[26] A pause control approach to the value iteration scheme in average Markov decision processes
Cavazos-Cadena, R
[J]. SYSTEMS & CONTROL LETTERS, 1998, 33 (04) : 209 - 219
[27] Variance reduced value iteration and faster algorithms for solving Markov decision processes
Sidford, Aaron
Wang, Mengdi
Wu, Xian
Ye, Yinyu
[J]. NAVAL RESEARCH LOGISTICS, 2023, 70 (05) : 423 - 442
[28] A method for speeding up value iteration in partially observable Markov decision processes
Zhang, NL
Lee, SS
Zhang, WH
[J]. UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 1999, : 696 - 703
[29] Variance Reduced Value Iteration and Faster Algorithms for Solving Markov Decision Processes
Sidford, Aaron
Wang, Mengdi
Wu, Xian
Ye, Yinyu
[J]. SODA'18: PROCEEDINGS OF THE TWENTY-NINTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2018, : 770 - 787
[30] ISOTONE POLICIES FOR THE VALUE-ITERATION METHOD FOR MARKOV DECISION-PROCESSES
WHITE, DJ
[J]. OR SPEKTRUM, 1984, 6 (04) : 223 - 227

← 1 2 3 4 5 →