Topological Value Iteration Algorithm for Markov Decision Processes

被引：0

作者：

Dai, Peng ^{[1
]}

Goldsmith, Judy ^{[1
]}

机构：

[1] Univ Kentucky, Dept Comp Sci, Lexington, KY 40506 USA

来源：

20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE | 2007年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Value Iteration is an inefficient algorithm for Markov decision processes (MDPs) because it puts the majority of its effort into backing up the entire state space, which turns out to be unnecessary in many cases. In order to overcome this problem, many approaches have been proposed. Among them, LAO*, LRTDP and HDP are state-of-the-art ones. All of these use reachability analysis and heuristics to avoid some unnecessary backups. However, none of these approaches fully exploit the graphical features of the MDPs or use these features to yield the best backup sequence of the state space. We introduce an algorithm named Topological Value Iteration (TVI) that can circumvent the problem of unnecessary backups by detecting the structure of MDPs and backing up states based on topological sequences. We prove that the backup sequence TVI applies is optimal. Our experimental results show that TVI outperforms VI, LAO*, LRTDP and HDP on our benchmark MDPs.

引用

页码：1860 / 1865

页数：6

共 50 条

[1] A Modified Value Iteration Algorithm for Discounted Markov Decision Processes
Chafik, Sanaa
Daoui, Cherki
[J]. JOURNAL OF ELECTRONIC COMMERCE IN ORGANIZATIONS, 2015, 13 (03) : 47 - 57
[2] Value set iteration for Markov decision processes
Chang, Hyeong Soo
[J]. AUTOMATICA, 2014, 50 (07) : 1940 - 1943
[3] A NEW PARALLELIZED OF HIERARCHICAL VALUE ITERATION ALGORITHM FOR DISCOUNTED MARKOV DECISION PROCESSES
Nachaoui, Mourad
Chafik, Sanae
Daoui, Cherki
[J]. DISCRETE AND CONTINUOUS DYNAMICAL SYSTEMS-SERIES S, 2022,
[4] Toward an Optimized Value Iteration Algorithm for Average Cost Markov Decision Processes
Arruda, Edilson F.
Ourique, Fabricio
Almudevar, Anthony
[J]. 49TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2010, : 930 - 934
[5] New prioritized value iteration for Markov decision processes
de Guadalupe Garcia-Hernandez, Ma.
Ruiz-Pinales, Jose
Onaindia, Eva
Gabriel Avina-Cervantes, J.
Ledesma-Orozco, Sergio
Alvarado-Mendez, Edgar
Reyes-Ballesteros, Alberto
[J]. ARTIFICIAL INTELLIGENCE REVIEW, 2012, 37 (02) : 157 - 167
[6] New prioritized value iteration for Markov decision processes
Ma. de Guadalupe Garcia-Hernandez
Jose Ruiz-Pinales
Eva Onaindia
J. Gabriel Aviña-Cervantes
Sergio Ledesma-Orozco
Edgar Alvarado-Mendez
Alberto Reyes-Ballesteros
[J]. Artificial Intelligence Review, 2012, 37 : 157 - 167
[7] The value iteration method for countable state Markov decision processes
Aviv, Y
Federgruen, A
[J]. OPERATIONS RESEARCH LETTERS, 1999, 24 (05) : 223 - 234
[8] Acceleration Operators in the Value Iteration Algorithms for Markov Decision Processes
Shlakhter, Oleksandr
Lee, Chi-Guhn
Khmelev, Dmitry
Jaber, Nasser
[J]. OPERATIONS RESEARCH, 2010, 58 (01) : 193 - 202
[9] Adiabatic Markov Decision Process: Convergence of Value Iteration Algorithm
Thai Duong
Duong Nguyen-Huu
Thinh Nguyen
[J]. JOURNAL OF DYNAMIC SYSTEMS MEASUREMENT AND CONTROL-TRANSACTIONS OF THE ASME, 2016, 138 (06):
[10] Markov Decision Process Parallel Value Iteration Algorithm On GPU
Chen, Peng
Lu, Lu
[J]. PROCEEDINGS OF 2013 INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND COMPUTER APPLICATIONS (ICSA 2013), 2013, 92 : 299 - 304

← 1 2 3 4 5 →