Trading Performance for Stability in Markov Decision Processes

被引:12
|
作者
Brazdil, Tomas [1 ]
Chatterjee, Krishnendu [2 ]
Forejt, Vojtech [1 ,3 ]
Kucera, Antonin [1 ]
机构
[1] Masaryk Univ, Fac Informat, CS-60177 Brno, Czech Republic
[2] IST Austria, Klosterneuburg, Austria
[3] Univ Oxford, Dept Comp Sci, Oxford OX1 2JD, England
基金
英国工程与自然科学研究理事会; 奥地利科学基金会;
关键词
MEAN-VARIANCE TRADEOFFS;
D O I
10.1109/LICS.2013.39
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We study the complexity of central controller synthesis problems for finite-state Markov decision processes, where the objective is to optimize both the expected mean-payoff performance of the system and its stability. We argue that the basic theoretical notion of expressing the stability in terms of the variance of the mean-payoff (called global variance in our paper) is not always sufficient, since it ignores possible instabilities on respective runs. For this reason we propose alernative definitions of stability, which we call local and hybrid variance, and which express how rewards on each run deviate from the run's own mean-payoff and from the expected mean-payoff, respectively. We show that a strategy ensuring both the expected mean-payoff and the variance below given bounds requires randomization and memory, under all the above semantics of variance. We then look at the problem of determining whether there is a such a strategy. For the global variance, we show that the problem is in PSPACE, and that the answer can be approximated in pseudo-polynomial time. For the hybrid variance, the analogous decision problem is in NP, and a polynomial-time approximating algorithm also exists. For local variance, we show that the decision problem is in NP. Since the overall performance can be traded for stability (and vice versa), we also present algorithms for approximating the associated Pareto curve in all the three cases. Finally, we study a special case of the decision problems, where we require a given expected mean-payoff together with zero variance. Here we show that the problems can be all solved in polynomial time.
引用
收藏
页码:331 / 340
页数:10
相关论文
共 50 条
  • [1] Trading performance for stability in Markov decision processes
    Brazdil, Tomas
    Chatterjee, Krishnendu
    Forejt, Vojtech
    Kucera, Antonin
    [J]. JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2017, 84 : 144 - 170
  • [2] Stability Estimation of Transient Markov Decision Processes
    Gordienko, Evgueni
    Martinez, Jaime
    Ruiz de Chavez, Juan
    [J]. XI SYMPOSIUM ON PROBABILITY AND STOCHASTIC PROCESSES, 2015, 69 : 157 - 176
  • [3] Performance Guarantees for Homomorphisms beyond Markov Decision Processes
    Majeed, Sultan Javed
    Hutter, Marcus
    [J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 7659 - 7666
  • [4] Stability-constrained Markov Decision Processes using MPC
    Zanon, Mario
    Gros, Sebastien
    Palladino, Michele
    [J]. AUTOMATICA, 2022, 143
  • [5] Markov Decision Processes
    Bäuerle N.
    Rieder U.
    [J]. Jahresbericht der Deutschen Mathematiker-Vereinigung, 2010, 112 (4) : 217 - 243
  • [6] Improving multipath live streaming performance with Markov decision processes
    Bui, Vinh
    Zhu, Weiping
    [J]. 2007 INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS AND INFORMATION TECHNOLOGIES, VOLS 1-3, 2007, : 580 - 585
  • [7] Partially Observable Markov Decision Processes and Performance Sensitivity Analysis
    Li, Yanjie
    Yin, Baoqun
    Xi, Hongsheng
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2008, 38 (06): : 1645 - 1651
  • [8] Uniqueness and stability of optimal policies of finite state Markov decision processes
    Leizarowitz, Arie
    Zaslavski, Alexander J.
    [J]. MATHEMATICS OF OPERATIONS RESEARCH, 2007, 32 (01) : 156 - 167
  • [9] Online Markov Decision Processes
    Even-Dar, Eyal
    Kakade, Sham M.
    Mansour, Yishay
    [J]. MATHEMATICS OF OPERATIONS RESEARCH, 2009, 34 (03) : 726 - 736
  • [10] MARKOV DECISION-PROCESSES
    SCHAL, M
    [J]. STOCHASTIC PROCESSES AND THEIR APPLICATIONS, 1984, 17 (01) : 13 - 13