Robust Markov Decision Processes

被引：218

作者：

Wiesemann, Wolfram ^{[1
]}

Kuhn, Daniel ^{[1
]}

Rustem, Berc ^{[1
]}

机构：

[1] Univ London Imperial Coll Sci Technol & Med, Dept Comp, London SW7 2AZ, England

来源：

MATHEMATICS OF OPERATIONS RESEARCH | 2013年 / 38卷 / 01期

基金：

英国工程与自然科学研究理事会;

关键词：

robust optimization; Markov decision processes; semidefinite programming;

D O I：

10.1287/moor.1120.0566

中图分类号：

C93 [管理学]; O22 [运筹学];

学科分类号：

070105 ; 12 ; 1201 ; 1202 ; 120202 ;

摘要：

Markov decision processes (MDPs) are powerful tools for decision making in uncertain dynamic environments. However, the solutions of MDPs are of limited practical use because of their sensitivity to distributional model parameters, which are typically unknown and have to be estimated by the decision maker. To counter the detrimental effects of estimation errors, we consider robust MDPs that offer probabilistic guarantees in view of the unknown parameters. To this end, we assume that an observation history of the MDP is available. Based on this history, we derive a confidence region that contains the unknown parameters with a prespecified probability 1 - beta. Afterward, we determine a policy that attains the highest worst-case performance over this confidence region. By construction, this policy achieves or exceeds its worst-case performance with a confidence of at least 1 - beta. Our method involves the solution of tractable conic programs of moderate size.

引用

页码：153 / 183

页数：31

共 50 条

[1] Distributionally Robust Markov Decision Processes
Xu, Huan
Mannor, Shie
MATHEMATICS OF OPERATIONS RESEARCH, 2012, 37 (02) : 288 - 300
[2] Robust Anytime Learning of Markov Decision Processes
Suilen, Marnix
Simao, Thiago D.
Parker, David
Jansen, Nils
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[3] Distributionally Robust Counterpart in Markov Decision Processes
Yu, Pengqian
Xu, Huan
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2016, 61 (09) : 2538 - 2543
[4] Robust Markov Decision Processes: Beyond Rectangularity
Goyal, Vineet
Grand-Clement, Julien
MATHEMATICS OF OPERATIONS RESEARCH, 2023, 48 (01) : 203 - 226
[5] Reinforcement Learning in Robust Markov Decision Processes
Lim, Shiau Hong
Xu, Huan
Mannor, Shie
MATHEMATICS OF OPERATIONS RESEARCH, 2016, 41 (04) : 1325 - 1353
[6] On the Convex Formulations of Robust Markov Decision Processes
Grand-Clement, Julien
Petrik, Marek
MATHEMATICS OF OPERATIONS RESEARCH, 2024,
[7] Policy iteration for robust nonstationary Markov decision processes
Saumya Sinha
Archis Ghate
Optimization Letters, 2016, 10 : 1613 - 1628
[8] Policy Gradient for Rectangular Robust Markov Decision Processes
Kumar, Navdeep
Derman, Esther
Geist, Matthieu
Levy, Kfir
Mannor, Shie
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[9] Robust Average-Reward Markov Decision Processes
Wang, Yue
Velasquez, Alvaro
Atia, George
Prater-Bennette, Ashley
Zou, Shaofeng
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 12, 2023, : 15215 - 15223
[10] Policy iteration for robust nonstationary Markov decision processes
Sinha, Saumya
Ghate, Archis
OPTIMIZATION LETTERS, 2016, 10 (08) : 1613 - 1628

← 1 2 3 4 5 →