Learning Infinite-Horizon Average-Reward Markov Decision Processes with Constraints

被引：0

作者：

Chen, Liyu ^{[1
]}

Jain, Rahul ^{[1
]}

Luo, Haipeng ^{[1
]}

机构：

[1] Univ Southern Calif, Los Angeles, CA 90007 USA

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162 | 2022年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We study regret minimization for infinite horizon average-reward Markov Decision Processes (MDPs) under cost constraints. We start by designing a policy optimization algorithm with carefully designed action-value estimator and bonus term, and show that for ergodic MDPs, our algorithm ensures (O) over tilde (root T) regret and constant constraint violation, where T is the total number of time steps. This strictly improves over the algorithm of (Singh et al., 2020), whose regret and constraint violation are both (O) over tilde (T-2/3). Next, we consider the most general class of weakly communicating MDPs. Through a finite-horizon approximation, we develop another algorithm with (O) over tilde (T-2/3) regret and constraint violation, which can be further improved to (O) over tilde(root T) via a simple modification, albeit making the algorithm computationally inefficient. As far as we know, these are the first set of provable algorithms for weakly communicating MDPs with cost constraints.

引用

页数：25

共 50 条

[1] Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation
Wei, Chen-Yu
Jafarnia-Jahromi, Mehdi
Luo, Haipeng
Jain, Rahul
[J]. 24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
[2] A Provably-Efficient Model-Free Algorithm for Infinite-Horizon Average-Reward Constrained Markov Decision Processes
Wei, Honghao
Liu, Xin
Ying, Lei
[J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 3868 - 3876
[3] Learning and Planning in Average-Reward Markov Decision Processes
Wan, Yi
Naik, Abhishek
Sutton, Richard S.
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139 : 7665 - 7676
[4] Robust Average-Reward Markov Decision Processes
Wang, Yue
Velasquez, Alvaro
Atia, George
Prater-Bennette, Ashley
Zou, Shaofeng
[J]. THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 12, 2023, : 15215 - 15223
[5] Average-Reward Decentralized Markov Decision Processes
Petrik, Marek
Zilberstein, Shlomo
[J]. 20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 1997 - 2002
[6] REVERSIBLE MARKOV DECISION PROCESSES WITH AN AVERAGE-REWARD CRITERION
Cogill, Randy
Peng, Cheng
[J]. SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2013, 51 (01) : 402 - 418
[7] Learning Infinite-Horizon Average-Reward Restless Multi-Action Bandits via Index Awareness
Xiong, Guojun
Wang, Shufan
Li, Jian
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[8] Nearly Minimax Optimal Regret for Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation
Wu, Yue
Zhou, Dongruo
Gu, Quanquan
[J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
[9] Relative Q-Learning for Average-Reward Markov Decision Processes with Continuous States
Yang, Xiangyu
Hu, Jiaqiao
Hu, Jian-Qiang
[J]. IEEE Transactions on Automatic Control, 2024, 69 (10) : 6546 - 6560
[10] Sharper Model-free Reinforcement Learning for Average-reward Markov Decision Processes
Zhang, Zihan
Xie, Qiaomin
[J]. THIRTY SIXTH ANNUAL CONFERENCE ON LEARNING THEORY, VOL 195, 2023, 195

← 1 2 3 4 5 →