Stability-constrained Markov Decision Processes using MPC

被引:6
|
作者
Zanon, Mario [1 ]
Gros, Sebastien [2 ]
Palladino, Michele [3 ]
机构
[1] IMT Sch Adv Studies Lucca, Piazza San Francesco 19, I-55100 Lucca, Italy
[2] NTNU, Trondheim, Norway
[3] Univ Aquila, Dept Informat Engn Comp Sci & Math DISIM, via Vetoio, I-67100 Laquila, Italy
关键词
Markov Decision Processes; Model Predictive Control; Stability; Safe reinforcement learning; MODEL-PREDICTIVE CONTROL; SYSTEMS;
D O I
10.1016/j.automatica.2022.110399
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we consider solving discounted Markov Decision Processes (MDPs) under the constraint that the resulting policy is stabilizing. In practice MDPs are solved based on some form of policy approximation. We will leverage recent results proposing to use Model Predictive Control (MPC) as a structured approximator in the context of Reinforcement Learning, which makes it possible to introduce stability requirements directly inside the MPC-based policy. This will restrict the solution of the MDP to stabilizing policies by construction. Because the stability theory for MPC is most mature for the undiscounted MPC case, we will first show in this paper that stable discounted MDPs can be reformulated as undiscounted ones. This observation will entail that the undiscounted MPC-based policy with stability guarantees will produce the optimal policy for the discounted MDP if it is stable, and the best stabilizing policy otherwise. (C) 2022 Elsevier Ltd. All rights reserved.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Functional Stability of Discounted Markov Decision Processes Using Economic MPC Dissipativity Theory
    Kordabad, Arash Bahari
    Gros, Sebastien
    [J]. 2022 EUROPEAN CONTROL CONFERENCE (ECC), 2022, : 1858 - 1863
  • [2] On constrained Markov decision processes
    Haviv, M
    [J]. OPERATIONS RESEARCH LETTERS, 1996, 19 (01) : 25 - 28
  • [3] Planning using hierarchical constrained Markov decision processes
    Feyzabadi, Seyedshams
    Carpin, Stefano
    [J]. AUTONOMOUS ROBOTS, 2017, 41 (08) : 1589 - 1607
  • [4] Planning using hierarchical constrained Markov decision processes
    Seyedshams Feyzabadi
    Stefano Carpin
    [J]. Autonomous Robots, 2017, 41 : 1589 - 1607
  • [5] Learning in Constrained Markov Decision Processes
    Singh, Rahul
    Gupta, Abhishek
    Shroff, Ness B.
    [J]. IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS, 2023, 10 (01): : 441 - 453
  • [6] Dynamic programming in constrained Markov decision processes
    Piunovskiy, A. B.
    [J]. CONTROL AND CYBERNETICS, 2006, 35 (03): : 645 - 660
  • [7] Robustness of policies in constrained Markov decision processes
    Zadorojniy, A
    Shwartz, A
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2006, 51 (04) : 635 - 638
  • [8] Reinforcement Learning for Constrained Markov Decision Processes
    Gattami, Ather
    Bai, Qinbo
    Aggarwal, Vaneet
    [J]. 24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [9] Markov decision processes with constrained stopping times
    Horiguchi, M
    Kurano, M
    Yasuda, M
    [J]. PROCEEDINGS OF THE 39TH IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-5, 2000, : 706 - 710
  • [10] Relaxation for Constrained Decentralized Markov Decision Processes
    Xu, Jie
    [J]. AAMAS'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS & MULTIAGENT SYSTEMS, 2016, : 1313 - 1314