Solving Ergodic Markov Decision Processes and Perfect Information Zero-sum Stochastic Games by Variance Reduced Deflated Value Iteration

被引:0
|
作者
Akian, Marianne [1 ,2 ]
Gaubert, Stephane [1 ,2 ]
Qu, Zheng [3 ]
Saadi, Omar [1 ,2 ]
机构
[1] Ecole Polytech, INRIA, F-91128 Palaiseau, France
[2] Ecole Polytech, CMAP, Route Saclay, F-91128 Palaiseau, France
[3] Univ Hong Kong, Dept Math, Room 419,Run Run Shaw Bldg,Pokfulam Rd, Hong Kong, Peoples R China
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, Sidford, Wang, Wu and Ye (2018) developed an algorithm combining variance reduction techniques with value iteration to solve discounted Markov decision processes. This algorithm has a sublinear complexity when the discount factor is fixed. Here, we extend this approach to mean-payoff problems, including both Markov decision processes and perfect information zero-sum stochastic games. We obtain sublinear complexity bounds, assuming there is a distinguished state which is accessible from all initial states and for all policies. Our method is based on a reduction from the mean payoff problem to the discounted problem by a Doob h-transform, combined with a deflation technique. The complexity analysis of this algorithm uses at the same time the techniques developed by Sidford et al. in the discounted case and non-linear spectral theory techniques (Collatz-Wielandt characterization of the eigenvalue).
引用
收藏
页码:5963 / 5970
页数:8
相关论文
共 50 条
  • [1] Zero-sum ergodic stochastic games
    Jaskiewicz, Anna
    Nowak, Andrzej S.
    [J]. 2005 44TH IEEE CONFERENCE ON DECISION AND CONTROL & EUROPEAN CONTROL CONFERENCE, VOLS 1-8, 2005, : 1741 - 1746
  • [2] Heuristic Search Value Iteration for Zero-Sum Stochastic Games
    Buffet, Olivier
    Dibangoye, Jilles
    Saffidine, Abdallah
    Thomas, Vincent
    [J]. IEEE TRANSACTIONS ON GAMES, 2021, 13 (03) : 239 - 248
  • [3] Variance reduced value iteration and faster algorithms for solving Markov decision processes
    Sidford, Aaron
    Wang, Mengdi
    Wu, Xian
    Ye, Yinyu
    [J]. NAVAL RESEARCH LOGISTICS, 2023, 70 (05) : 423 - 442
  • [4] Variance Reduced Value Iteration and Faster Algorithms for Solving Markov Decision Processes
    Sidford, Aaron
    Wang, Mengdi
    Wu, Xian
    Ye, Yinyu
    [J]. SODA'18: PROCEEDINGS OF THE TWENTY-NINTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2018, : 770 - 787
  • [5] On the optimality equation for zero-sum ergodic stochastic games
    Jaskiewicz, A
    Nowak, AS
    [J]. MATHEMATICAL METHODS OF OPERATIONS RESEARCH, 2001, 54 (02) : 291 - 301
  • [6] Value set iteration for two-person zero-sum Markov games
    Chang, Hyeong Soo
    [J]. AUTOMATICA, 2017, 76 : 61 - 64
  • [7] On the optimality equation for zero-sum ergodic stochastic, games
    Jaśkiewicz A.
    Nowak A.S.
    [J]. Mathematical Methods of Operations Research, 2001, 54 (2) : 291 - 301
  • [8] ZERO-SUM GAMES WITH ALMOST PERFECT INFORMATION
    PONSSARD, JP
    [J]. MANAGEMENT SCIENCE SERIES A-THEORY, 1975, 21 (07): : 794 - 805
  • [9] Zero-sum Stochastic Games with Asymmetric Information
    Kartik, Dhruva
    Nayyar, Ashutosh
    [J]. 2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 4061 - 4066
  • [10] AN ACCRETIVE OPERATOR APPROACH TO ERGODIC ZERO-SUM STOCHASTIC GAMES
    Hochart, Antoine
    [J]. JOURNAL OF DYNAMICS AND GAMES, 2019, 6 (01): : 27 - 51