Sleeping experts and bandits approach to constrained Markov decision processes

被引：0

作者：

Chang, Hyeong Soo ^{[1
]}

机构：

[1] Sogang Univ, Dept Comp Sci & Engn, Seoul, South Korea

来源：

AUTOMATICA | 2016年 / 63卷

关键词：

Markov decision processes; Sleeping experts and bandits; Learning algorithm; Constrained optimization; SAMPLE AVERAGE APPROXIMATION; POLICIES;

D O I：

10.1016/j.automatica.2015.10.015

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This communique presents simple simulation-based algorithms for obtaining an approximately optimal policy in a given finite set in large finite constrained Markov decision processes. The algorithms are adapted from playing strategies for "sleeping experts and bandits" problem and their computational complexities are independent of state and action space sizes if the given policy set is relatively small. We establish convergence of their expected performances to the value of an optimal policy and convergence rates, and also almost-sure convergence to an optimal policy with an exponential rate for the algorithm adapted within the context of sleeping experts. (C) 2015 Elsevier Ltd. All rights reserved.

引用

页码：182 / 186

页数：5

共 50 条

[1] Regret bounds for sleeping experts and bandits
Robert Kleinberg
Alexandru Niculescu-Mizil
Yogeshwer Sharma
[J]. Machine Learning, 2010, 80 : 245 - 272
[2] On constrained Markov decision processes
Haviv, M
[J]. OPERATIONS RESEARCH LETTERS, 1996, 19 (01) : 25 - 28
[3] Regret bounds for sleeping experts and bandits
Kleinberg, Robert
Niculescu-Mizil, Alexandru
Sharma, Yogeshwer
[J]. MACHINE LEARNING, 2010, 80 (2-3) : 245 - 272
[4] A Dual Approach to Constrained Markov Decision Processes with Entropy Regularization
Ying, Donghao
Ding, Yuhao
Lavaei, Javad
[J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
[5] Learning in Constrained Markov Decision Processes
Singh, Rahul
Gupta, Abhishek
Shroff, Ness B.
[J]. IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS, 2023, 10 (01): : 441 - 453
[6] A Policy Gradient Approach for Finite Horizon Constrained Markov Decision Processes
Guin, Soumyajit
Bhatnagar, Shalabh
[J]. 2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 3353 - 3359
[7] Dynamic programming in constrained Markov decision processes
Piunovskiy, A. B.
[J]. CONTROL AND CYBERNETICS, 2006, 35 (03): : 645 - 660
[8] Robustness of policies in constrained Markov decision processes
Zadorojniy, A
Shwartz, A
[J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2006, 51 (04) : 635 - 638
[9] Markov decision processes with constrained stopping times
Horiguchi, M
Kurano, M
Yasuda, M
[J]. PROCEEDINGS OF THE 39TH IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-5, 2000, : 706 - 710
[10] Reinforcement Learning for Constrained Markov Decision Processes
Gattami, Ather
Bai, Qinbo
Aggarwal, Vaneet
[J]. 24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130

← 1 2 3 4 5 →