Doubly-Asynchronous Value Iteration: Making Value Iteration Asynchronous in Actions

被引:0
|
作者
Tian, Tian [1 ]
Young, Kenny
Sutton, Richard S.
机构
[1] Univ Alberta, Edmonton, AB, Canada
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022 | 2022年
基金
加拿大自然科学与工程研究理事会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Value iteration (VI) is a foundational dynamic programming method, important for learning and planning in optimal control and reinforcement learning. VI proceeds in batches, where the update to the value of each state must be completed before the next batch of updates can begin. Completing a single batch is prohibitively expensive if the state space is large, rendering VI impractical for many applications. Asynchronous VI helps to address the large state space problem by updating one state at a time, in-place and in an arbitrary order. However, Asynchronous VI still requires a maximization over the entire action space, making it impractical for domains with large action space. To address this issue, we propose doubly-asynchronous value iteration (DAVI), a new algorithm that generalizes the idea of asynchrony from states to states and actions. More concretely, DAVI maximizes over a sampled subset of actions that can be of any user-defined size. This simple approach of using sampling to reduce computation maintains similarly appealing theoretical properties to VI without the need to wait for a full sweep through the entire action space in each update. In this paper, we show DAVI converges to the optimal value function with probability one, converges at a near-geometric rate with probability 1 - delta, and returns a near-optimal policy in computation time that nearly matches a previously established bound for VI. We also empirically demonstrate DAVI's effectiveness in several experiments.
引用
收藏
页数:11
相关论文
共 50 条
  • [11] Value iteration
    Chatterjee, Krishnendu
    Henzinger, Thomas A.
    25 YEARS OF MODEL CHECKING: HISTORY, ACHIEVEMENTS, PERSPECTIVES, 2008, 5000 : 107 - 138
  • [12] THE ASYNCHRONOUS POWER ITERATION: A GRAPH SIGNAL PERSPECTIVE
    Teke, Oguzhan
    Vaidyanathan, P. P.
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4059 - 4063
  • [13] Asynchronous multisplitting iteration with different weighting schemes
    Chen, Fang
    APPLIED MATHEMATICS AND COMPUTATION, 2010, 216 (06) : 1771 - 1776
  • [15] A NEW GENERALIZED ASYNCHRONOUS PARALLELMULTISPLITTING ITERATION METHOD
    Zhong-zhi Bai(State Key Laboratory of Scientific and Engineering Computing
    Journal of Computational Mathematics, 1999, (05) : 449 - 456
  • [16] On the Convergence of Asynchronous Parallel Iteration with Unbounded Delays
    Zhimin Peng
    Yangyang Xu
    Ming Yan
    Wotao Yin
    Journal of the Operations Research Society of China, 2019, 7 : 5 - 42
  • [17] On the Convergence of Asynchronous Parallel Iteration with Unbounded Delays
    Peng, Zhimin
    Xu, Yangyang
    Yan, Ming
    Yin, Wotao
    JOURNAL OF THE OPERATIONS RESEARCH SOCIETY OF CHINA, 2019, 7 (01) : 5 - 42
  • [18] Value Iteration Networks
    Tamar, Aviv
    Wu, Yi
    Thomas, Garrett
    Levine, Sergey
    Abbeel, Pieter
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [19] Value Iteration Networks
    Tamar, Aviv
    Wu, Yi
    Thomas, Garrett
    Levine, Sergey
    Abbeel, Pieter
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4949 - 4953
  • [20] Optimistic Value Iteration
    Hartmanns, Arnd
    Kaminski, Benjamin Lucien
    COMPUTER AIDED VERIFICATION, PT II, 2020, 12225 : 488 - 511