ASYNCHRONOUS VALUE ITERATION FOR MARKOV DECISION PROCESSES WITH CONTINUOUS STATE SPACES

被引：2

作者：

Yang, Xiangyu ^{[1
]}

Hu, Jian-Qiang ^{[1
]}

Hu, Jiaqiao ^{[2
]}

Peng, Yijie ^{[3
]}

机构：

[1] Fudan Univ, Sch Management, Shanghai, Peoples R China

[2] SUNY Stony Brook, Dept Appl Math & Stat, Stony Brook, NY 11794 USA

[3] Peking Univ, Guanghua Sch Management, Beijing, Peoples R China

来源：

2020 WINTER SIMULATION CONFERENCE (WSC) | 2020年

基金：

中国国家自然科学基金;

关键词：

SIMULATION;

D O I：

10.1109/WSC48552.2020.9384120

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

We propose a simulation-based value iteration algorithm for approximately solving infinite horizon discounted MDPs with continuous state spaces and finite actions. At each time step, the algorithm employs the shrinking ball method to estimate the value function at sampled states and uses historical estimates in an interpolation-based fitting strategy to build an approximator of the optimal value function. Under moderate conditions, we prove that the sequence of approximators generated by the algorithm converges uniformly to the optimal value function with probability one. Simple numerical examples are provided to compare our algorithm with two other existing methods.

引用

页码：2856 / 2866

页数：11

共 50 条

[41] Geometric Policy Iteration for Markov Decision Processes
Wu, Yue
De Loera, Jesus A.
[J]. PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 2070 - 2078
[42] Policy set iteration for Markov decision processes
Chang, Hyeong Soo
[J]. AUTOMATICA, 2013, 49 (12) : 3687 - 3689
[43] AsyncQVI: Asynchronous-Parallel Q-Value Iteration for Discounted Markov Decision Processes with Near-Optimal Sample Complexity
Zeng, Yibo
Feng, Fei
Yin, Wotao
[J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108 : 713 - 722
[44] Average optimality for continuous-time Markov decision processes with a policy iteration approach
Zhu, Quanxin
[J]. JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 2008, 339 (01) : 691 - 704
[45] Semiparametric estimation of Markov decision processes with continuous state space
Srisuma, Sorawoot
Linton, Oliver
[J]. JOURNAL OF ECONOMETRICS, 2012, 166 (02) : 320 - 341
[46] Finite State Approximations of Markov Decision Processes with General State and Action Spaces
Saldi, Naci
Linder, Tamas
Yueksel, Serdar
[J]. 2015 AMERICAN CONTROL CONFERENCE (ACC), 2015, : 3589 - 3594
[47] SERIAL AND PARALLEL VALUE-ITERATION ALGORITHMS FOR DISCOUNTED MARKOV DECISION-PROCESSES
ARCHIBALD, TW
MCKINNON, KIM
THOMAS, LC
[J]. EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 1993, 67 (02) : 188 - 203
[48] COMPUTATIONAL COMPARISON OF VALUE-ITERATION ALGORITHMS FOR DISCOUNTED MARKOV DECISION-PROCESSES
THOMAS, LC
HARLEY, R
LAVERCOMBE, AC
[J]. OPERATIONS RESEARCH LETTERS, 1983, 2 (02) : 72 - 76
[49] An optimistic value iteration for mean-variance optimization in discounted Markov decision processes
Ma, Shuai
Ma, Xiaoteng
Xia, Li
[J]. RESULTS IN CONTROL AND OPTIMIZATION, 2022, 8
[50] Kernel Taylor-Based Value Function Approximation for Continuous-State Markov Decision Processes
Xu, Junhong
Yin, Kai
Liu, Lantao
[J]. ROBOTICS: SCIENCE AND SYSTEMS XVI, 2020,

← 1 2 3 4 5 →