On the Convergence of Techniques that Improve Value Iteration

被引:0
|
作者
Grzes, Marek [1 ]
Hoey, Jesse [1 ]
机构
[1] Univ Waterloo, Sch Comp Sci, Waterloo, ON N2L 3G1, Canada
来源
2013 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2013年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Prioritisation of Bellman backups or updating only a small subset of actions represent important techniques for speeding up planning in MDPs. The recent literature showed new efficient approaches which exploit these directions. Backward value iteration and backing up only the best actions were shown to lead to a significant reduction of the planning time. This paper conducts a theoretical and empirical analysis of these techniques and shows new important proofs. In particular, (1) it identifies weaker requirements for the convergence of backups based on best actions only, (2) a new method for evaluation of the Bellman error is shown for the update that updates one best action once, (3) it presents the theoretical proof of backward value iteration and establishes required initialisation, (4) and shows that the default state ordering of backups in standard value iteration can significantly influence its performance. Additionally, (5) the existing literature did not compare these methods, either empirically or analytically, against policy iteration. The rigorous empirical and novel theoretical parts of the paper reveal important associations and allow drawing guidelines on which type of value or policy iteration is suitable for a given domain. Finally, our chief message is that standard value iteration can be made far more efficient by simple modifications shown in the paper.
引用
收藏
页数:8
相关论文
共 50 条
  • [31] Speeding up the convergence of value iteration in partially observable Markov decision processes
    Zhang, NL
    Zhang, WH
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2001, 14 : 29 - 51
  • [32] Accelerated Value Iteration for Nonlinear Zero-Sum Games with Convergence Guarantee
    Wang, Yuan
    Zhao, Mingming
    Liu, Nan
    Wang, Ding
    GUIDANCE NAVIGATION AND CONTROL, 2024, 04 (01)
  • [33] VALUE ITERATION CONVERGENCE OF ε-MONOTONE SCHEMES FOR STATIONARY HAMILTON-JACOBI EQUATIONS
    Bokanowski, Olivier
    Falcone, Maurizio
    Ferretti, Roberto
    Gruene, Lars
    Kalise, Dante
    Zidani, Hasnaa
    DISCRETE AND CONTINUOUS DYNAMICAL SYSTEMS, 2015, 35 (09) : 4041 - 4070
  • [34] Comparison of Linear Iteration Schemes to Improve the Convergence of Iterative Physical Optics for an Impedance Scatterer
    Yoo, Jeong-Un
    Koh, Il-Suek
    JOURNAL OF ELECTROMAGNETIC ENGINEERING AND SCIENCE, 2023, 23 (01): : 78 - 80
  • [35] A Multi-step Iteration Scheme to Improve the Convergence of Loop-tree Matrix
    Liu, Y. A.
    Chew, W. C.
    2008 IEEE ANTENNAS AND PROPAGATION SOCIETY INTERNATIONAL SYMPOSIUM, VOLS 1-9, 2008, : 3403 - +
  • [36] Value iteration
    Chatterjee, Krishnendu
    Henzinger, Thomas A.
    25 YEARS OF MODEL CHECKING: HISTORY, ACHIEVEMENTS, PERSPECTIVES, 2008, 5000 : 107 - 138
  • [37] Discrete-Time Local Value Iteration Adaptive Dynamic Programming: Convergence Analysis
    Wei, Qinglai
    Lewis, Frank L.
    Liu, Derong
    Song, Ruizhuo
    Lin, Hanquan
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2018, 48 (06): : 875 - 891
  • [38] UNCONDITIONAL CONVERGENCE OF AN ITERATION PROCESS
    HSU, LC
    NOTICES OF THE AMERICAN MATHEMATICAL SOCIETY, 1973, 20 (06): : A577 - A577
  • [39] On the convergence of optimistic policy iteration
    Tsitsiklis, JN
    JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (01) : 59 - 72
  • [40] Convergence of the modified Newton-type iteration method for the generalized absolute value equation
    Fang, Ximing
    Huang, Minhai
    ARABIAN JOURNAL OF MATHEMATICS, 2025, : 29 - 37