Robust Average-Reward Markov Decision Processes

被引：0

作者：

Wang, Yue ^{[1
]}

Velasquez, Alvaro ^{[2
]}

Atia, George ^{[3
]}

Prater-Bennette, Ashley ^{[4
]}

Zou, Shaofeng ^{[1
]}

机构：

[1] Univ Buffalo State Univ New York, Buffalo, NY 14222 USA

[2] Univ Colorado, Boulder, CO 80309 USA

[3] Univ Cent Florida, Orlando, FL 32816 USA

[4] Air Force Res Lab, Wright Patterson AFB, OH USA

来源：

THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 12 | 2023年

基金：

美国国家科学基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In robust Markov decision processes (MDPs), the uncertainty in the transition kernel is addressed by finding a policy that optimizes the worst-case performance over an uncertainty set of MDPs. While much of the literature has focused on discounted MDPs, robust average-reward MDPs remain largely unexplored. In this paper, we focus on robust average-reward MDPs, where the goal is to find a policy that optimizes the worst-case average reward over an uncertainty set. We first take an approach that approximates average-reward MDPs using discounted MDPs. We prove that the robust discounted value function converges to the robust average-reward as the discount factor goes to 1, and moreover when it is large, any optimal policy of the robust discounted MDP is also an optimal policy of the robust average-reward. We further design a robust dynamic programming approach, and theoretically characterize its convergence to the optimum. Then, we investigate robust average-reward MDPs directly without using discounted MDPs as an intermediate step. We derive the robust Bellman equation for robust average-reward MDPs, prove that the optimal policy can be derived from its solution, and further design a robust relative value iteration algorithm that provably finds its solution, or equivalently, the optimal robust policy.

引用

页码：15215 / 15223

页数：9

共 50 条

[1] Average-Reward Decentralized Markov Decision Processes
Petrik, Marek
Zilberstein, Shlomo
[J]. 20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 1997 - 2002
[2] Learning and Planning in Average-Reward Markov Decision Processes
Wan, Yi
Naik, Abhishek
Sutton, Richard S.
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139 : 7665 - 7676
[3] REVERSIBLE MARKOV DECISION PROCESSES WITH AN AVERAGE-REWARD CRITERION
Cogill, Randy
Peng, Cheng
[J]. SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2013, 51 (01) : 402 - 418
[4] Incremental Improvements of Heuristic Policies for Average-Reward Markov Decision Processes
Reveliotis, S.
Ibrahim, M.
[J]. IFAC PAPERSONLINE, 2020, 53 (02): : 1721 - 1728
[5] NECESSARY CONDITIONS FOR THE OPTIMALITY EQUATION IN AVERAGE-REWARD MARKOV DECISION-PROCESSES
CAVAZOSCADENA, R
[J]. APPLIED MATHEMATICS AND OPTIMIZATION, 1989, 19 (01): : 97 - 112
[6] A Duality Approach for Regret Minimization in Average-Reward Ergodic Markov Decision Processes
Gong, Hao
Wang, Mengdi
[J]. LEARNING FOR DYNAMICS AND CONTROL, VOL 120, 2020, 120 : 862 - 883
[7] Learning Infinite-Horizon Average-Reward Markov Decision Processes with Constraints
Chen, Liyu
Jain, Rahul
Luo, Haipeng
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[8] Relative Q-Learning for Average-Reward Markov Decision Processes with Continuous States
Yang, Xiangyu
Hu, Jiaqiao
Hu, Jian-Qiang
[J]. IEEE Transactions on Automatic Control, 2024, 69 (10) : 6546 - 6560
[9] Sharper Model-free Reinforcement Learning for Average-reward Markov Decision Processes
Zhang, Zihan
Xie, Qiaomin
[J]. THIRTY SIXTH ANNUAL CONFERENCE ON LEARNING THEORY, VOL 195, 2023, 195
[10] Achieving target state-action frequencies in multichain average-reward Markov decision processes
Krass, D
Vrieze, OJ
[J]. MATHEMATICS OF OPERATIONS RESEARCH, 2002, 27 (03) : 545 - 566

← 1 2 3 4 5 →