Doubly-Asynchronous Value Iteration: Making Value Iteration Asynchronous in Actions

被引:0
|
作者
Tian, Tian [1 ]
Young, Kenny
Sutton, Richard S.
机构
[1] Univ Alberta, Edmonton, AB, Canada
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022 | 2022年
基金
加拿大自然科学与工程研究理事会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Value iteration (VI) is a foundational dynamic programming method, important for learning and planning in optimal control and reinforcement learning. VI proceeds in batches, where the update to the value of each state must be completed before the next batch of updates can begin. Completing a single batch is prohibitively expensive if the state space is large, rendering VI impractical for many applications. Asynchronous VI helps to address the large state space problem by updating one state at a time, in-place and in an arbitrary order. However, Asynchronous VI still requires a maximization over the entire action space, making it impractical for domains with large action space. To address this issue, we propose doubly-asynchronous value iteration (DAVI), a new algorithm that generalizes the idea of asynchrony from states to states and actions. More concretely, DAVI maximizes over a sampled subset of actions that can be of any user-defined size. This simple approach of using sampling to reduce computation maintains similarly appealing theoretical properties to VI without the need to wait for a full sweep through the entire action space in each update. In this paper, we show DAVI converges to the optimal value function with probability one, converges at a near-geometric rate with probability 1 - delta, and returns a near-optimal policy in computation time that nearly matches a previously established bound for VI. We also empirically demonstrate DAVI's effectiveness in several experiments.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Asynchronous Value Iteration Network
    Pan, Zhiyuan
    Zhang, Zongzhang
    Chen, Zixuan
    NEURAL INFORMATION PROCESSING (ICONIP 2018), PT II, 2018, 11302 : 169 - 180
  • [2] Planning Network Model Based on Generalized Asynchronous Value Iteration
    Chen Z.-X.
    Zhang Z.-Z.
    Pan Z.-Y.
    Zhang L.-J.
    Ruan Jian Xue Bao/Journal of Software, 2021, 32 (11): : 3496 - 3511
  • [3] ASYNCHRONOUS VALUE ITERATION FOR MARKOV DECISION PROCESSES WITH CONTINUOUS STATE SPACES
    Yang, Xiangyu
    Hu, Jian-Qiang
    Hu, Jiaqiao
    Peng, Yijie
    2020 WINTER SIMULATION CONFERENCE (WSC), 2020, : 2856 - 2866
  • [4] A probabilistic analysis of asynchronous iteration
    Strikwerda, JC
    LINEAR ALGEBRA AND ITS APPLICATIONS, 2002, 349 (1-3) : 125 - 154
  • [5] Generalized multisplitting asynchronous iteration
    Department of Mathematics, Fudan University, Shanghai 200433, China
    Linear Algebra Its Appl, (77-92):
  • [7] Approximate Value Iteration with Temporally Extended Actions
    Mann, Timothy A.
    Mannor, Shie
    Precup, Doina
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2015, 53 : 375 - 438
  • [8] Value Iteration in Continuous Actions, States and Time
    Lutter, Michael
    Mannor, Shie
    Peters, Jan
    Fox, Dieter
    Garg, Animesh
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [9] NUMERICAL PERFORMANCE OF AN ASYNCHRONOUS JACOBI ITERATION
    BULL, JM
    FREEMAN, TL
    LECTURE NOTES IN COMPUTER SCIENCE, 1992, 634 : 361 - 366
  • [10] Implementing Asynchronous Jacobi Iteration on GPUs
    Tsai, Yu-Hsiang Mike
    Nayak, Pratik
    Chow, Edmond
    Anzt, Hartwig
    Proceedings of ScalAH 2022: 13th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Heterogeneous Systems, Held in conjunction with SC 2022: The International Conference for High Performance Computing, Networking, Storage and Analysis, 2022, : 1 - 9