Sleeping experts and bandits approach to constrained Markov decision processes

被引:0
|
作者
Chang, Hyeong Soo [1 ]
机构
[1] Sogang Univ, Dept Comp Sci & Engn, Seoul, South Korea
关键词
Markov decision processes; Sleeping experts and bandits; Learning algorithm; Constrained optimization; SAMPLE AVERAGE APPROXIMATION; POLICIES;
D O I
10.1016/j.automatica.2015.10.015
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This communique presents simple simulation-based algorithms for obtaining an approximately optimal policy in a given finite set in large finite constrained Markov decision processes. The algorithms are adapted from playing strategies for "sleeping experts and bandits" problem and their computational complexities are independent of state and action space sizes if the given policy set is relatively small. We establish convergence of their expected performances to the value of an optimal policy and convergence rates, and also almost-sure convergence to an optimal policy with an exponential rate for the algorithm adapted within the context of sleeping experts. (C) 2015 Elsevier Ltd. All rights reserved.
引用
收藏
页码:182 / 186
页数:5
相关论文
共 50 条
  • [1] Regret bounds for sleeping experts and bandits
    Robert Kleinberg
    Alexandru Niculescu-Mizil
    Yogeshwer Sharma
    [J]. Machine Learning, 2010, 80 : 245 - 272
  • [2] On constrained Markov decision processes
    Haviv, M
    [J]. OPERATIONS RESEARCH LETTERS, 1996, 19 (01) : 25 - 28
  • [3] Regret bounds for sleeping experts and bandits
    Kleinberg, Robert
    Niculescu-Mizil, Alexandru
    Sharma, Yogeshwer
    [J]. MACHINE LEARNING, 2010, 80 (2-3) : 245 - 272
  • [4] A Dual Approach to Constrained Markov Decision Processes with Entropy Regularization
    Ying, Donghao
    Ding, Yuhao
    Lavaei, Javad
    [J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
  • [5] Learning in Constrained Markov Decision Processes
    Singh, Rahul
    Gupta, Abhishek
    Shroff, Ness B.
    [J]. IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS, 2023, 10 (01): : 441 - 453
  • [6] A Policy Gradient Approach for Finite Horizon Constrained Markov Decision Processes
    Guin, Soumyajit
    Bhatnagar, Shalabh
    [J]. 2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 3353 - 3359
  • [7] Dynamic programming in constrained Markov decision processes
    Piunovskiy, A. B.
    [J]. CONTROL AND CYBERNETICS, 2006, 35 (03): : 645 - 660
  • [8] Robustness of policies in constrained Markov decision processes
    Zadorojniy, A
    Shwartz, A
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2006, 51 (04) : 635 - 638
  • [9] Markov decision processes with constrained stopping times
    Horiguchi, M
    Kurano, M
    Yasuda, M
    [J]. PROCEEDINGS OF THE 39TH IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-5, 2000, : 706 - 710
  • [10] Reinforcement Learning for Constrained Markov Decision Processes
    Gattami, Ather
    Bai, Qinbo
    Aggarwal, Vaneet
    [J]. 24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130