Complexity of finite-horizon Markov decision process problems

被引:79
|
作者
Mundhenk, M [1 ]
Goldsmith, J
Lusena, C
Allender, E
机构
[1] Univ Trier, FB Informat 4, D-54286 Trier, Germany
[2] Univ Kentucky, Dept Comp Sci, Lexington, KY 40506 USA
[3] Rutgers State Univ, Dept Comp Sci, Piscataway, NJ 08855 USA
关键词
computational complexity; Markov decision processes; NP; NPPP; partially observable Markov decision processes; PL; PSPACE; succinct representations;
D O I
10.1145/347476.347480
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Controlled stochastic systems occur in science, engineering, manufacturing, social sciences, and many other contexts. If the system is modeled as a Markov decision process (MDP) and will run ad infinitum, the optimal control policy can be computed in polynomial time using linear programming. The problems considered here assume that the time that the process will run is finite, and based on the size of the input. There are many factors that compound the complexity of computing the optimal policy. For instance, there are many factors that compound the complexity of this computation. For instance, if the controller does not have complete information about the state of the system, or if the system is represented in some very succinct manner, the optimal policy is provably not computable in time polynomial in the size of the input. We analyze the computational complexity of evaluating policies and of determining whether a sufficiently good policy exists for a MDP, based on a number of confounding factors, including the observability of the system state; the succinctness of the representation; the type of policy; even the number of actions relative to the number of states. In almost every case, we show that the decision problem is complete for some known complexity class. Some of these results are familiar from work by Papadimitriou and Tsitsiklis and others, but some, such as our PL-completeness proofs, are surprising. We include proofs of completeness for natural problems in the as yet little-studied classes NPPP.
引用
收藏
页码:681 / 720
页数:40
相关论文
共 50 条
  • [1] An extended ε-constraint method for a multiobjective finite-horizon Markov decision process
    Eghbali-Zarch, Maryam
    Tavakkoli-Moghaddam, Reza
    Azaron, Amir
    Dehghan-Sanej, Kazem
    [J]. INTERNATIONAL TRANSACTIONS IN OPERATIONAL RESEARCH, 2022, 29 (05) : 3131 - 3160
  • [2] Finding the K best policies in a finite-horizon Markov decision process
    Nielsen, Lars Relund
    Kristensen, Anders Ringgaard
    [J]. EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2006, 175 (02) : 1164 - 1179
  • [3] Finite-horizon variance penalised Markov decision processes
    Collins E.J.
    [J]. Operations-Research-Spektrum, 1997, 19 (1) : 35 - 39
  • [4] Finite-horizon variance penalised Markov decision processes
    Collins, EJ
    [J]. OR SPEKTRUM, 1997, 19 (01) : 35 - 39
  • [5] Viscosity solutions approach to finite-horizon continuous-time Markov decision process
    Liao, Zhong-Wei
    Shao, Jinghai
    [J]. INTERNATIONAL JOURNAL OF CONTROL, 2024,
  • [6] Markov-achievable payoffs for finite-horizon decision models
    Pestien, V
    Wang, XB
    [J]. STOCHASTIC PROCESSES AND THEIR APPLICATIONS, 1998, 73 (01) : 101 - 118
  • [7] Poisoning finite-horizon Markov decision processes at design time
    Caballero, William N.
    Jenkins, Phillip R.
    Keith, Andrew J.
    [J]. COMPUTERS & OPERATIONS RESEARCH, 2021, 129
  • [8] Lexicographic refinements in possibilistic decision trees and finite-horizon Markov decision processes
    Ben Amor, Nahla
    El Khalfi, Zeineb
    Fargier, Helene
    Sabbadin, Regis
    [J]. FUZZY SETS AND SYSTEMS, 2019, 366 : 85 - 109
  • [9] Finite approximation for finite-horizon continuous-time Markov decision processes
    Qingda Wei
    [J]. 4OR, 2017, 15 : 67 - 84
  • [10] Finite approximation for finite-horizon continuous-time Markov decision processes
    Wei, Qingda
    [J]. 4OR-A QUARTERLY JOURNAL OF OPERATIONS RESEARCH, 2017, 15 (01): : 67 - 84