Robust Markov Decision Processes

被引:198
|
作者
Wiesemann, Wolfram [1 ]
Kuhn, Daniel [1 ]
Rustem, Berc [1 ]
机构
[1] Univ London Imperial Coll Sci Technol & Med, Dept Comp, London SW7 2AZ, England
基金
英国工程与自然科学研究理事会;
关键词
robust optimization; Markov decision processes; semidefinite programming;
D O I
10.1287/moor.1120.0566
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 ; 12 ; 1201 ; 1202 ; 120202 ;
摘要
Markov decision processes (MDPs) are powerful tools for decision making in uncertain dynamic environments. However, the solutions of MDPs are of limited practical use because of their sensitivity to distributional model parameters, which are typically unknown and have to be estimated by the decision maker. To counter the detrimental effects of estimation errors, we consider robust MDPs that offer probabilistic guarantees in view of the unknown parameters. To this end, we assume that an observation history of the MDP is available. Based on this history, we derive a confidence region that contains the unknown parameters with a prespecified probability 1 - beta. Afterward, we determine a policy that attains the highest worst-case performance over this confidence region. By construction, this policy achieves or exceeds its worst-case performance with a confidence of at least 1 - beta. Our method involves the solution of tractable conic programs of moderate size.
引用
收藏
页码:153 / 183
页数:31
相关论文
共 50 条
  • [31] Multi-objective Robust Strategy Synthesis for Interval Markov Decision Processes
    Hahn, Ernst Moritz
    Hashemi, Vahid
    Hermanns, Holger
    Lahijanian, Morteza
    Turrini, Andrea
    [J]. QUANTITATIVE EVALUATION OF SYSTEMS (QEST 2017), 2017, 10503 : 207 - 223
  • [32] Partial policy iteration for L1-Robust Markov decision processes
    Ho, Chin Pang
    Petrik, Marek
    Wiesemann, Wolfram
    [J]. Journal of Machine Learning Research, 2021, 22
  • [33] TOWARD THEORETICAL UNDERSTANDINGS OF ROBUST MARKOV DECISION PROCESSES: SAMPLE COMPLEXITY AND ASYMPTOTICS
    Yang, Wenhao
    Zhang, Liangyu
    Zhang, Zhihua
    [J]. ANNALS OF STATISTICS, 2022, 50 (06): : 3223 - 3248
  • [34] Robust topological policy iteration for infinite horizon bounded Markov Decision Processes
    Silva Reis, Willy Arthur
    de Barros, Leliane Nunes
    Delgado, Karina Valdivia
    [J]. INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2019, 105 : 287 - 304
  • [35] Robust path planning for flexible needle insertion using Markov decision processes
    Xiaoyu Tan
    Pengqian Yu
    Kah-Bin Lim
    Chee-Kong Chui
    [J]. International Journal of Computer Assisted Radiology and Surgery, 2018, 13 : 1439 - 1451
  • [36] Robust optimal policies for Markov decision processes with safety-threshold constraints
    Dimitrova, Rayna
    Fu, Jie
    Topcu, Ufuk
    [J]. 2016 IEEE 55TH CONFERENCE ON DECISION AND CONTROL (CDC), 2016, : 7081 - 7086
  • [37] Robust path planning for flexible needle insertion using Markov decision processes
    Tan, Xiaoyu
    Yu, Pengqian
    Lim, Kah-Bin
    Chui, Chee-Kong
    [J]. INTERNATIONAL JOURNAL OF COMPUTER ASSISTED RADIOLOGY AND SURGERY, 2018, 13 (09) : 1439 - 1451
  • [38] Robust Action Selection in Partially Observable Markov Decision Processes with Model Uncertainty
    El Chamie, Mahmoud
    Mostafa, Hala
    [J]. 2018 IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2018, : 5586 - 5591
  • [39] A Robust Crew Pairing based on Multi-Agent Markov Decision Processes
    Aoun, Oussama
    El Afia, Abdellatif
    [J]. 2014 SECOND WORLD CONFERENCE ON COMPLEX SYSTEMS (WCCS), 2014, : 762 - 768
  • [40] Robust Markov control processes
    Jaskiewicz, Anna
    Nowak, Andrzej S.
    [J]. JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 2014, 420 (02) : 1337 - 1353