Stability-constrained Markov Decision Processes using MPC

被引：6

作者：

Zanon, Mario ^{[1
]}

Gros, Sebastien ^{[2
]}

Palladino, Michele ^{[3
]}

机构：

[1] IMT Sch Adv Studies Lucca, Piazza San Francesco 19, I-55100 Lucca, Italy

[2] NTNU, Trondheim, Norway

[3] Univ Aquila, Dept Informat Engn Comp Sci & Math DISIM, via Vetoio, I-67100 Laquila, Italy

来源：

AUTOMATICA | 2022年 / 143卷

关键词：

Markov Decision Processes; Model Predictive Control; Stability; Safe reinforcement learning; MODEL-PREDICTIVE CONTROL; SYSTEMS;

D O I：

10.1016/j.automatica.2022.110399

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, we consider solving discounted Markov Decision Processes (MDPs) under the constraint that the resulting policy is stabilizing. In practice MDPs are solved based on some form of policy approximation. We will leverage recent results proposing to use Model Predictive Control (MPC) as a structured approximator in the context of Reinforcement Learning, which makes it possible to introduce stability requirements directly inside the MPC-based policy. This will restrict the solution of the MDP to stabilizing policies by construction. Because the stability theory for MPC is most mature for the undiscounted MPC case, we will first show in this paper that stable discounted MDPs can be reformulated as undiscounted ones. This observation will entail that the undiscounted MPC-based policy with stability guarantees will produce the optimal policy for the discounted MDP if it is stable, and the best stabilizing policy otherwise. (C) 2022 Elsevier Ltd. All rights reserved.

引用

页数：9

共 50 条

[31] Constrained discounted semi-Markov decision processes
Feinberg, EA
[J]. MARKOV PROCESSES AND CONTROLLED MARKOV CHAINS, 2002, : 233 - 244
[32] Joint chance-constrained Markov decision processes
V Varagapriya
Vikas Vikram Singh
Abdel Lisser
[J]. Annals of Operations Research, 2023, 322 : 1013 - 1035
[33] Trading performance for stability in Markov decision processes
Brazdil, Tomas
Chatterjee, Krishnendu
Forejt, Vojtech
Kucera, Antonin
[J]. JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2017, 84 : 144 - 170
[34] Trading Performance for Stability in Markov Decision Processes
Brazdil, Tomas
Chatterjee, Krishnendu
Forejt, Vojtech
Kucera, Antonin
[J]. 2013 28TH ANNUAL IEEE/ACM SYMPOSIUM ON LOGIC IN COMPUTER SCIENCE (LICS), 2013, : 331 - 340
[35] Stability Estimation of Transient Markov Decision Processes
Gordienko, Evgueni
Martinez, Jaime
Ruiz de Chavez, Juan
[J]. XI SYMPOSIUM ON PROBABILITY AND STOCHASTIC PROCESSES, 2015, 69 : 157 - 176
[36] Stability-Constrained Power System Scheduling: A Review
Luo, Jianqiang
Teng, Fei
Bu, Siqi
[J]. IEEE ACCESS, 2020, 8 : 219331 - 219343
[37] Transient stability-constrained maximum allowable transfer
Bettiol, AL
Wehenkel, L
Pavella, M
[J]. IEEE TRANSACTIONS ON POWER SYSTEMS, 1999, 14 (02) : 654 - 659
[38] Constrained Multiagent Markov Decision Processes: a Taxonomy of Problems and Algorithms
de Nijs, Frits
Walraven, Erwin
de Weerdt, Mathijs M.
Spaan, Matthijs T. J.
[J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2021, 70 : 955 - 1001
[39] Constrained Markov Decision Processes with Total Expected Cost Criteria
Altman, Eitan
Boularouk, Said
Josselin, Didier
[J]. PROCEEDINGS OF THE 12TH EAI INTERNATIONAL CONFERENCE ON PERFORMANCE EVALUATION METHODOLOGIES AND TOOLS (VALUETOOLS 2019), 2019, : 191 - 192
[40] An actor-critic algorithm for constrained Markov decision processes
Borkar, VS
[J]. SYSTEMS & CONTROL LETTERS, 2005, 54 (03) : 207 - 213

← 1 2 3 4 5 →