Topological Value Iteration Algorithm for Markov Decision Processes

被引:0
|
作者
Dai, Peng [1 ]
Goldsmith, Judy [1 ]
机构
[1] Univ Kentucky, Dept Comp Sci, Lexington, KY 40506 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Value Iteration is an inefficient algorithm for Markov decision processes (MDPs) because it puts the majority of its effort into backing up the entire state space, which turns out to be unnecessary in many cases. In order to overcome this problem, many approaches have been proposed. Among them, LAO*, LRTDP and HDP are state-of-the-art ones. All of these use reachability analysis and heuristics to avoid some unnecessary backups. However, none of these approaches fully exploit the graphical features of the MDPs or use these features to yield the best backup sequence of the state space. We introduce an algorithm named Topological Value Iteration (TVI) that can circumvent the problem of unnecessary backups by detecting the structure of MDPs and backing up states based on topological sequences. We prove that the backup sequence TVI applies is optimal. Our experimental results show that TVI outperforms VI, LAO*, LRTDP and HDP on our benchmark MDPs.
引用
收藏
页码:1860 / 1865
页数:6
相关论文
共 50 条
  • [1] A Modified Value Iteration Algorithm for Discounted Markov Decision Processes
    Chafik, Sanaa
    Daoui, Cherki
    [J]. JOURNAL OF ELECTRONIC COMMERCE IN ORGANIZATIONS, 2015, 13 (03) : 47 - 57
  • [2] Value set iteration for Markov decision processes
    Chang, Hyeong Soo
    [J]. AUTOMATICA, 2014, 50 (07) : 1940 - 1943
  • [3] A NEW PARALLELIZED OF HIERARCHICAL VALUE ITERATION ALGORITHM FOR DISCOUNTED MARKOV DECISION PROCESSES
    Nachaoui, Mourad
    Chafik, Sanae
    Daoui, Cherki
    [J]. DISCRETE AND CONTINUOUS DYNAMICAL SYSTEMS-SERIES S, 2022,
  • [4] Toward an Optimized Value Iteration Algorithm for Average Cost Markov Decision Processes
    Arruda, Edilson F.
    Ourique, Fabricio
    Almudevar, Anthony
    [J]. 49TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2010, : 930 - 934
  • [5] New prioritized value iteration for Markov decision processes
    de Guadalupe Garcia-Hernandez, Ma.
    Ruiz-Pinales, Jose
    Onaindia, Eva
    Gabriel Avina-Cervantes, J.
    Ledesma-Orozco, Sergio
    Alvarado-Mendez, Edgar
    Reyes-Ballesteros, Alberto
    [J]. ARTIFICIAL INTELLIGENCE REVIEW, 2012, 37 (02) : 157 - 167
  • [6] New prioritized value iteration for Markov decision processes
    Ma. de Guadalupe Garcia-Hernandez
    Jose Ruiz-Pinales
    Eva Onaindia
    J. Gabriel Aviña-Cervantes
    Sergio Ledesma-Orozco
    Edgar Alvarado-Mendez
    Alberto Reyes-Ballesteros
    [J]. Artificial Intelligence Review, 2012, 37 : 157 - 167
  • [7] The value iteration method for countable state Markov decision processes
    Aviv, Y
    Federgruen, A
    [J]. OPERATIONS RESEARCH LETTERS, 1999, 24 (05) : 223 - 234
  • [8] Acceleration Operators in the Value Iteration Algorithms for Markov Decision Processes
    Shlakhter, Oleksandr
    Lee, Chi-Guhn
    Khmelev, Dmitry
    Jaber, Nasser
    [J]. OPERATIONS RESEARCH, 2010, 58 (01) : 193 - 202
  • [9] Adiabatic Markov Decision Process: Convergence of Value Iteration Algorithm
    Thai Duong
    Duong Nguyen-Huu
    Thinh Nguyen
    [J]. JOURNAL OF DYNAMIC SYSTEMS MEASUREMENT AND CONTROL-TRANSACTIONS OF THE ASME, 2016, 138 (06):
  • [10] Markov Decision Process Parallel Value Iteration Algorithm On GPU
    Chen, Peng
    Lu, Lu
    [J]. PROCEEDINGS OF 2013 INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND COMPUTER APPLICATIONS (ICSA 2013), 2013, 92 : 299 - 304