A Sublinear-Regret Reinforcement Learning Algorithm on Constrained Markov Decision Processes with reset action

被引：0

作者：

Watanabe, Takashi ^{[1
]}

Sakuragawa, Takashi ^{[1
]}

机构：

[1] Kyoto Univ, Grad Sch Human & Environm Studies, Kyoto, Japan

来源：

ICMLSC 2020: PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND SOFT COMPUTING | 2020年

关键词：

reinforcement learning; constrained Markov decision processes; regret analysis; expected average-reward criterion;

D O I：

10.1145/3380688.3380706

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we study model-based reinforcement learning in an unknown constrained Markov Decision Processes (CMDPs) with reset action. We propose an algorithm, Constrained-UCRL, which uses confidence interval like UCRL2, and solves linear programming problem to compute policy at the start of each episode. We show that Constrained-UCRL achieves sublinear regret bounds (O) over tilde (SA(1/2)T(3/4)) up to logarithmic factors with high probability for both the gain and the constraint violations.

引用

页码：51 / 55

页数：5

共 50 条

[1] Reinforcement Learning for Constrained Markov Decision Processes
Gattami, Ather
Bai, Qinbo
Aggarwal, Vaneet
[J]. 24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
[2] Reinforcement Learning Algorithms for Regret Minimization in Structured Markov Decision Processes
Prabuchandran, K. J.
Bodas, Tejas
Tulabandhula, Theja
[J]. AAMAS'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS & MULTIAGENT SYSTEMS, 2016, : 1289 - 1290
[3] A reinforcement learning based algorithm for Markov decision processes
Bhatnagar, S
Kumar, S
[J]. 2005 International Conference on Intelligent Sensing and Information Processing, Proceedings, 2005, : 199 - 204
[4] Reinforcement Learning of Risk-Constrained Policies in Markov Decision Processes
Brazdil, Tomas
Chatterjee, Krishnendu
Novotny, Petr
Vahala, Jiri
[J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 9794 - 9801
[5] Reinforcement learning algorithm for partially observable Markov decision processes
Wang, Xue-Ning
He, Han-Gen
Xu, Xin
[J]. Kongzhi yu Juece/Control and Decision, 2004, 19 (11): : 1263 - 1266
[6] Learning in Constrained Markov Decision Processes
Singh, Rahul
Gupta, Abhishek
Shroff, Ness B.
[J]. IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS, 2023, 10 (01): : 441 - 453
[7] An Inverse Reinforcement Learning Algorithm for semi-Markov Decision Processes
Tan, Chuanfang
Li, Yanjie
Cheng, Yuhu
[J]. 2017 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2017, : 1256 - 1261
[8] A reinforcement learning based algorithm for finite horizon Markov decision processes
Bhatnagar, Shalabh
Abdulla, Mohammed Shahid
[J]. PROCEEDINGS OF THE 45TH IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-14, 2006, : 5519 - 5524
[9] Triple-Q: A Model-Free Algorithm for Constrained Reinforcement Learning with Sublinear Regret and Zero Constraint Violation
Wei, Honghao
Liu, Xin
Ying, Lei
[J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
[10] Semi-Infinitely Constrained Markov Decision Processes and Provably Efficient Reinforcement Learning
Zhang, Liangyu
Peng, Yang
Yang, Wenhao
Zhang, Zhihua
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (05) : 3722 - 3735

← 1 2 3 4 5 →