Learning Infinite-Horizon Average-Reward Markov Decision Processes with Constraints

被引:0
|
作者
Chen, Liyu [1 ]
Jain, Rahul [1 ]
Luo, Haipeng [1 ]
机构
[1] Univ Southern Calif, Los Angeles, CA 90007 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study regret minimization for infinite horizon average-reward Markov Decision Processes (MDPs) under cost constraints. We start by designing a policy optimization algorithm with carefully designed action-value estimator and bonus term, and show that for ergodic MDPs, our algorithm ensures (O) over tilde (root T) regret and constant constraint violation, where T is the total number of time steps. This strictly improves over the algorithm of (Singh et al., 2020), whose regret and constraint violation are both (O) over tilde (T-2/3). Next, we consider the most general class of weakly communicating MDPs. Through a finite-horizon approximation, we develop another algorithm with (O) over tilde (T-2/3) regret and constraint violation, which can be further improved to (O) over tilde(root T) via a simple modification, albeit making the algorithm computationally inefficient. As far as we know, these are the first set of provable algorithms for weakly communicating MDPs with cost constraints.
引用
收藏
页数:25
相关论文
共 50 条
  • [1] Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation
    Wei, Chen-Yu
    Jafarnia-Jahromi, Mehdi
    Luo, Haipeng
    Jain, Rahul
    [J]. 24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [2] A Provably-Efficient Model-Free Algorithm for Infinite-Horizon Average-Reward Constrained Markov Decision Processes
    Wei, Honghao
    Liu, Xin
    Ying, Lei
    [J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 3868 - 3876
  • [3] Learning and Planning in Average-Reward Markov Decision Processes
    Wan, Yi
    Naik, Abhishek
    Sutton, Richard S.
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139 : 7665 - 7676
  • [4] Robust Average-Reward Markov Decision Processes
    Wang, Yue
    Velasquez, Alvaro
    Atia, George
    Prater-Bennette, Ashley
    Zou, Shaofeng
    [J]. THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 12, 2023, : 15215 - 15223
  • [5] Average-Reward Decentralized Markov Decision Processes
    Petrik, Marek
    Zilberstein, Shlomo
    [J]. 20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 1997 - 2002
  • [6] REVERSIBLE MARKOV DECISION PROCESSES WITH AN AVERAGE-REWARD CRITERION
    Cogill, Randy
    Peng, Cheng
    [J]. SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2013, 51 (01) : 402 - 418
  • [7] Learning Infinite-Horizon Average-Reward Restless Multi-Action Bandits via Index Awareness
    Xiong, Guojun
    Wang, Shufan
    Li, Jian
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [8] Nearly Minimax Optimal Regret for Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation
    Wu, Yue
    Zhou, Dongruo
    Gu, Quanquan
    [J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
  • [9] Relative Q-Learning for Average-Reward Markov Decision Processes with Continuous States
    Yang, Xiangyu
    Hu, Jiaqiao
    Hu, Jian-Qiang
    [J]. IEEE Transactions on Automatic Control, 2024, 69 (10) : 6546 - 6560
  • [10] Sharper Model-free Reinforcement Learning for Average-reward Markov Decision Processes
    Zhang, Zihan
    Xie, Qiaomin
    [J]. THIRTY SIXTH ANNUAL CONFERENCE ON LEARNING THEORY, VOL 195, 2023, 195