A Sublinear-Regret Reinforcement Learning Algorithm on Constrained Markov Decision Processes with reset action

被引:0
|
作者
Watanabe, Takashi [1 ]
Sakuragawa, Takashi [1 ]
机构
[1] Kyoto Univ, Grad Sch Human & Environm Studies, Kyoto, Japan
关键词
reinforcement learning; constrained Markov decision processes; regret analysis; expected average-reward criterion;
D O I
10.1145/3380688.3380706
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we study model-based reinforcement learning in an unknown constrained Markov Decision Processes (CMDPs) with reset action. We propose an algorithm, Constrained-UCRL, which uses confidence interval like UCRL2, and solves linear programming problem to compute policy at the start of each episode. We show that Constrained-UCRL achieves sublinear regret bounds (O) over tilde (SA(1/2)T(3/4)) up to logarithmic factors with high probability for both the gain and the constraint violations.
引用
收藏
页码:51 / 55
页数:5
相关论文
共 50 条
  • [31] From Perturbation Analysis to Markov Decision Processes and Reinforcement Learning
    Xi-Ren Cao
    [J]. Discrete Event Dynamic Systems, 2003, 13 : 9 - 39
  • [32] From perturbation analysis to Markov decision processes and reinforcement learning
    Cao, XR
    [J]. DISCRETE EVENT DYNAMIC SYSTEMS-THEORY AND APPLICATIONS, 2003, 13 (1-2): : 9 - 39
  • [33] Reinforcement Learning for Cost-Aware Markov Decision Processes
    Suttle, Wesley A.
    Zhang, Kaiqing
    Yang, Zhuoran
    Kraemer, David N.
    Liu, Ji
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [34] Online Learning for Markov Decision Processes in Nonstationary Environments: A Dynamic Regret Analysis
    Li, Yingying
    Li, Na
    [J]. 2019 AMERICAN CONTROL CONFERENCE (ACC), 2019, : 1232 - 1237
  • [35] Reinforcement Learning with State Observation Costs in Action-Contingent Noiselessly Observable Markov Decision Processes
    Nam, HyunJi Alex
    Fleming, Scott L.
    Brunskill, Emma
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [36] A Gradient-Aware Search Algorithm for Constrained Markov Decision Processes
    Khairy, Sami
    Balaprakash, Prasanna
    Cai, Lin X.
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 35 (12) : 1 - 8
  • [38] Hierarchical Method for Cooperative Multiagent Reinforcement Learning in Markov Decision Processes
    Bolshakov, V. E.
    Alfimtsev, A. N.
    [J]. DOKLADY MATHEMATICS, 2023, 108 (SUPPL 2) : S382 - S392
  • [39] Hierarchical Method for Cooperative Multiagent Reinforcement Learning in Markov Decision Processes
    V. E. Bolshakov
    A. N. Alfimtsev
    [J]. Doklady Mathematics, 2023, 108 : S382 - S392
  • [40] Model-Free Reinforcement Learning for Branching Markov Decision Processes
    Hahn, Ernst Moritz
    Perez, Mateo
    Schewe, Sven
    Somenzi, Fabio
    Trivedi, Ashutosh
    Wojtczak, Dominik
    [J]. COMPUTER AIDED VERIFICATION, PT II, CAV 2021, 2021, 12760 : 651 - 673