A Sublinear-Regret Reinforcement Learning Algorithm on Constrained Markov Decision Processes with reset action

被引:0
|
作者
Watanabe, Takashi [1 ]
Sakuragawa, Takashi [1 ]
机构
[1] Kyoto Univ, Grad Sch Human & Environm Studies, Kyoto, Japan
关键词
reinforcement learning; constrained Markov decision processes; regret analysis; expected average-reward criterion;
D O I
10.1145/3380688.3380706
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we study model-based reinforcement learning in an unknown constrained Markov Decision Processes (CMDPs) with reset action. We propose an algorithm, Constrained-UCRL, which uses confidence interval like UCRL2, and solves linear programming problem to compute policy at the start of each episode. We show that Constrained-UCRL achieves sublinear regret bounds (O) over tilde (SA(1/2)T(3/4)) up to logarithmic factors with high probability for both the gain and the constraint violations.
引用
收藏
页码:51 / 55
页数:5
相关论文
共 50 条
  • [1] Reinforcement Learning for Constrained Markov Decision Processes
    Gattami, Ather
    Bai, Qinbo
    Aggarwal, Vaneet
    [J]. 24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [2] Reinforcement Learning Algorithms for Regret Minimization in Structured Markov Decision Processes
    Prabuchandran, K. J.
    Bodas, Tejas
    Tulabandhula, Theja
    [J]. AAMAS'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS & MULTIAGENT SYSTEMS, 2016, : 1289 - 1290
  • [3] A reinforcement learning based algorithm for Markov decision processes
    Bhatnagar, S
    Kumar, S
    [J]. 2005 International Conference on Intelligent Sensing and Information Processing, Proceedings, 2005, : 199 - 204
  • [4] Reinforcement Learning of Risk-Constrained Policies in Markov Decision Processes
    Brazdil, Tomas
    Chatterjee, Krishnendu
    Novotny, Petr
    Vahala, Jiri
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 9794 - 9801
  • [5] Reinforcement learning algorithm for partially observable Markov decision processes
    Wang, Xue-Ning
    He, Han-Gen
    Xu, Xin
    [J]. Kongzhi yu Juece/Control and Decision, 2004, 19 (11): : 1263 - 1266
  • [6] Learning in Constrained Markov Decision Processes
    Singh, Rahul
    Gupta, Abhishek
    Shroff, Ness B.
    [J]. IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS, 2023, 10 (01): : 441 - 453
  • [7] An Inverse Reinforcement Learning Algorithm for semi-Markov Decision Processes
    Tan, Chuanfang
    Li, Yanjie
    Cheng, Yuhu
    [J]. 2017 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2017, : 1256 - 1261
  • [8] A reinforcement learning based algorithm for finite horizon Markov decision processes
    Bhatnagar, Shalabh
    Abdulla, Mohammed Shahid
    [J]. PROCEEDINGS OF THE 45TH IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-14, 2006, : 5519 - 5524
  • [9] Triple-Q: A Model-Free Algorithm for Constrained Reinforcement Learning with Sublinear Regret and Zero Constraint Violation
    Wei, Honghao
    Liu, Xin
    Ying, Lei
    [J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
  • [10] Semi-Infinitely Constrained Markov Decision Processes and Provably Efficient Reinforcement Learning
    Zhang, Liangyu
    Peng, Yang
    Yang, Wenhao
    Zhang, Zhihua
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (05) : 3722 - 3735