Reinforcement Learning for Constrained Markov Decision Processes

被引：0

作者：

Gattami, Ather ^{[1
]}

Bai, Qinbo ^{[2
]}

Aggarwal, Vaneet ^{[2
]}

机构：

[1] AI Sweden, Stockholm, Sweden

[2] Purdue Univ, W Lafayette, IN 47907 USA

来源：

24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS) | 2021年 / 130卷

关键词：

ALGORITHM;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we consider the problem of optimization and learning for constrained and multi-objective Markov decision processes, for both discounted rewards and expected average rewards. We formulate the problems as zero-sum games where one player (the agent) solves a Markov decision problem and its opponent solves a bandit optimization problem, which we here call Markov-Bandit games. We extend Q-learning to solve Markov-Bandit games and show that our new Q-learning algorithms converge to the optimal solutions of the zero-sum Markov-Bandit games, and hence converge to the optimal solutions of the constrained and multi-objective Markov decision problems. We provide numerical examples where we calculate the optimal policies and show by simulations that the algorithm converges to the calculated optimal policies. To the best of our knowledge, this is the first time Q-learning algorithms guarantee convergence to optimal stationary policies for the multi-objective Reinforcement Learning problem with discounted and expected average rewards, respectively.

引用

页数：11

共 50 条

[21] Hierarchical Method for Cooperative Multiagent Reinforcement Learning in Markov Decision Processes
V. E. Bolshakov
A. N. Alfimtsev
[J]. Doklady Mathematics, 2023, 108 : S382 - S392
[22] Model-Free Reinforcement Learning for Branching Markov Decision Processes
Hahn, Ernst Moritz
Perez, Mateo
Schewe, Sven
Somenzi, Fabio
Trivedi, Ashutosh
Wojtczak, Dominik
[J]. COMPUTER AIDED VERIFICATION, PT II, CAV 2021, 2021, 12760 : 651 - 673
[23] Towards Minimax Optimal Reinforcement Learning in Factored Markov Decision Processes
Tian, Yi
Qian, Jian
Sra, Suvrit
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[24] Kernel-Based Reinforcement Learning in Robust Markov Decision Processes
Lim, Shiau Hong
Autef, Arnaud
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
[25] Reinforcement Learning Algorithms for Regret Minimization in Structured Markov Decision Processes
Prabuchandran, K. J.
Bodas, Tejas
Tulabandhula, Theja
[J]. AAMAS'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS & MULTIAGENT SYSTEMS, 2016, : 1289 - 1290
[26] Adaptive aggregation for reinforcement learning in average reward Markov decision processes
Ronald Ortner
[J]. Annals of Operations Research, 2013, 208 : 321 - 336
[27] An Inverse Reinforcement Learning Algorithm for semi-Markov Decision Processes
Tan, Chuanfang
Li, Yanjie
Cheng, Yuhu
[J]. 2017 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2017, : 1256 - 1261
[28] A reinforcement learning based algorithm for finite horizon Markov decision processes
Bhatnagar, Shalabh
Abdulla, Mohammed Shahid
[J]. PROCEEDINGS OF THE 45TH IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-14, 2006, : 5519 - 5524
[29] Online Reinforcement Learning of Optimal Threshold Policies for Markov Decision Processes
Roy, Arghyadip
Borkar, Vivek
Karandikar, Abhay
Chaporkar, Prasanna
[J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2022, 67 (07) : 3722 - 3729
[30] Adaptive aggregation for reinforcement learning in average reward Markov decision processes
Ortner, Ronald
[J]. ANNALS OF OPERATIONS RESEARCH, 2013, 208 (01) : 321 - 336

← 1 2 3 4 5 →