Robust Markov Decision Processes

被引:198
|
作者
Wiesemann, Wolfram [1 ]
Kuhn, Daniel [1 ]
Rustem, Berc [1 ]
机构
[1] Univ London Imperial Coll Sci Technol & Med, Dept Comp, London SW7 2AZ, England
基金
英国工程与自然科学研究理事会;
关键词
robust optimization; Markov decision processes; semidefinite programming;
D O I
10.1287/moor.1120.0566
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 ; 12 ; 1201 ; 1202 ; 120202 ;
摘要
Markov decision processes (MDPs) are powerful tools for decision making in uncertain dynamic environments. However, the solutions of MDPs are of limited practical use because of their sensitivity to distributional model parameters, which are typically unknown and have to be estimated by the decision maker. To counter the detrimental effects of estimation errors, we consider robust MDPs that offer probabilistic guarantees in view of the unknown parameters. To this end, we assume that an observation history of the MDP is available. Based on this history, we derive a confidence region that contains the unknown parameters with a prespecified probability 1 - beta. Afterward, we determine a policy that attains the highest worst-case performance over this confidence region. By construction, this policy achieves or exceeds its worst-case performance with a confidence of at least 1 - beta. Our method involves the solution of tractable conic programs of moderate size.
引用
收藏
页码:153 / 183
页数:31
相关论文
共 50 条
  • [1] Distributionally Robust Markov Decision Processes
    Xu, Huan
    Mannor, Shie
    [J]. MATHEMATICS OF OPERATIONS RESEARCH, 2012, 37 (02) : 288 - 300
  • [2] Robust Anytime Learning of Markov Decision Processes
    Suilen, Marnix
    Simao, Thiago D.
    Parker, David
    Jansen, Nils
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [3] Distributionally Robust Counterpart in Markov Decision Processes
    Yu, Pengqian
    Xu, Huan
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2016, 61 (09) : 2538 - 2543
  • [4] Reinforcement Learning in Robust Markov Decision Processes
    Lim, Shiau Hong
    Xu, Huan
    Mannor, Shie
    [J]. MATHEMATICS OF OPERATIONS RESEARCH, 2016, 41 (04) : 1325 - 1353
  • [5] On the Convex Formulations of Robust Markov Decision Processes
    Grand-Clement, Julien
    Petrik, Marek
    [J]. MATHEMATICS OF OPERATIONS RESEARCH, 2024,
  • [6] Robust Markov Decision Processes: Beyond Rectangularity
    Goyal, Vineet
    Grand-Clement, Julien
    [J]. MATHEMATICS OF OPERATIONS RESEARCH, 2023, 48 (01) : 203 - 226
  • [7] Policy iteration for robust nonstationary Markov decision processes
    Saumya Sinha
    Archis Ghate
    [J]. Optimization Letters, 2016, 10 : 1613 - 1628
  • [8] Policy Gradient for Rectangular Robust Markov Decision Processes
    Kumar, Navdeep
    Derman, Esther
    Geist, Matthieu
    Levy, Kfir
    Mannor, Shie
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [9] Robust Average-Reward Markov Decision Processes
    Wang, Yue
    Velasquez, Alvaro
    Atia, George
    Prater-Bennette, Ashley
    Zou, Shaofeng
    [J]. THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 12, 2023, : 15215 - 15223
  • [10] Policy iteration for robust nonstationary Markov decision processes
    Sinha, Saumya
    Ghate, Archis
    [J]. OPTIMIZATION LETTERS, 2016, 10 (08) : 1613 - 1628