Robust Average-Reward Markov Decision Processes

被引:0
|
作者
Wang, Yue [1 ]
Velasquez, Alvaro [2 ]
Atia, George [3 ]
Prater-Bennette, Ashley [4 ]
Zou, Shaofeng [1 ]
机构
[1] Univ Buffalo State Univ New York, Buffalo, NY 14222 USA
[2] Univ Colorado, Boulder, CO 80309 USA
[3] Univ Cent Florida, Orlando, FL 32816 USA
[4] Air Force Res Lab, Wright Patterson AFB, OH USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In robust Markov decision processes (MDPs), the uncertainty in the transition kernel is addressed by finding a policy that optimizes the worst-case performance over an uncertainty set of MDPs. While much of the literature has focused on discounted MDPs, robust average-reward MDPs remain largely unexplored. In this paper, we focus on robust average-reward MDPs, where the goal is to find a policy that optimizes the worst-case average reward over an uncertainty set. We first take an approach that approximates average-reward MDPs using discounted MDPs. We prove that the robust discounted value function converges to the robust average-reward as the discount factor goes to 1, and moreover when it is large, any optimal policy of the robust discounted MDP is also an optimal policy of the robust average-reward. We further design a robust dynamic programming approach, and theoretically characterize its convergence to the optimum. Then, we investigate robust average-reward MDPs directly without using discounted MDPs as an intermediate step. We derive the robust Bellman equation for robust average-reward MDPs, prove that the optimal policy can be derived from its solution, and further design a robust relative value iteration algorithm that provably finds its solution, or equivalently, the optimal robust policy.
引用
收藏
页码:15215 / 15223
页数:9
相关论文
共 50 条
  • [1] Average-Reward Decentralized Markov Decision Processes
    Petrik, Marek
    Zilberstein, Shlomo
    [J]. 20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 1997 - 2002
  • [2] Learning and Planning in Average-Reward Markov Decision Processes
    Wan, Yi
    Naik, Abhishek
    Sutton, Richard S.
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139 : 7665 - 7676
  • [3] REVERSIBLE MARKOV DECISION PROCESSES WITH AN AVERAGE-REWARD CRITERION
    Cogill, Randy
    Peng, Cheng
    [J]. SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2013, 51 (01) : 402 - 418
  • [4] Incremental Improvements of Heuristic Policies for Average-Reward Markov Decision Processes
    Reveliotis, S.
    Ibrahim, M.
    [J]. IFAC PAPERSONLINE, 2020, 53 (02): : 1721 - 1728
  • [5] NECESSARY CONDITIONS FOR THE OPTIMALITY EQUATION IN AVERAGE-REWARD MARKOV DECISION-PROCESSES
    CAVAZOSCADENA, R
    [J]. APPLIED MATHEMATICS AND OPTIMIZATION, 1989, 19 (01): : 97 - 112
  • [6] A Duality Approach for Regret Minimization in Average-Reward Ergodic Markov Decision Processes
    Gong, Hao
    Wang, Mengdi
    [J]. LEARNING FOR DYNAMICS AND CONTROL, VOL 120, 2020, 120 : 862 - 883
  • [7] Learning Infinite-Horizon Average-Reward Markov Decision Processes with Constraints
    Chen, Liyu
    Jain, Rahul
    Luo, Haipeng
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [8] Relative Q-Learning for Average-Reward Markov Decision Processes with Continuous States
    Yang, Xiangyu
    Hu, Jiaqiao
    Hu, Jian-Qiang
    [J]. IEEE Transactions on Automatic Control, 2024, 69 (10) : 6546 - 6560
  • [9] Sharper Model-free Reinforcement Learning for Average-reward Markov Decision Processes
    Zhang, Zihan
    Xie, Qiaomin
    [J]. THIRTY SIXTH ANNUAL CONFERENCE ON LEARNING THEORY, VOL 195, 2023, 195
  • [10] Achieving target state-action frequencies in multichain average-reward Markov decision processes
    Krass, D
    Vrieze, OJ
    [J]. MATHEMATICS OF OPERATIONS RESEARCH, 2002, 27 (03) : 545 - 566