Reinforcement Learning in Robust Markov Decision Processes

被引:35
|
作者
Lim, Shiau Hong [1 ]
Xu, Huan [2 ]
Mannor, Shie [3 ]
机构
[1] IBM Res Collaboratory, Singapore, Singapore
[2] Natl Univ Singapore, Dept Ind & Syst Engn, Singapore, Singapore
[3] Technion, Dept Elect Engn, Haifa, Israel
基金
欧洲研究理事会; 以色列科学基金会;
关键词
robust MDP; reinforcement learning;
D O I
10.1287/moor.2016.0779
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 ; 12 ; 1201 ; 1202 ; 120202 ;
摘要
An important challenge in Markov decision processes (MDP) is to ensure robustness with respect to unexpected or adversarial system behavior. A standard paradigm to tackle this challenge is the robust MDP framework that models the parameters as arbitrary elements of pre-defined "uncertainty sets," and seeks the minimax policy-the policy that performs the best under the worst realization of the parameters in the uncertainty set. A crucial issue of the robust MDP framework, largely unaddressed in literature, is how to find appropriate description of the uncertainty in a principled data-driven way. In this paper we address this problem using an online learning approach: we devise an algorithm that, without knowing the true uncertainty model, is able to adapt its level of protection to uncertainty, and in the long run performs as well as the minimax policy as if the true uncertainty model is known. Indeed, the algorithm achieves similar regret bounds as standard MDP where no parameter is adversarial, which shows that with virtually no extra cost we can adapt robust learning to handle uncertainty in MDPs. To the best of our knowledge, this is the first attempt to learn uncertainty in robust MDPs.
引用
收藏
页码:1325 / 1353
页数:29
相关论文
共 50 条
  • [1] Kernel-Based Reinforcement Learning in Robust Markov Decision Processes
    Lim, Shiau Hong
    Autef, Arnaud
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [2] Reinforcement Learning for Constrained Markov Decision Processes
    Gattami, Ather
    Bai, Qinbo
    Aggarwal, Vaneet
    [J]. 24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [3] Robust Anytime Learning of Markov Decision Processes
    Suilen, Marnix
    Simao, Thiago D.
    Parker, David
    Jansen, Nils
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [4] A reinforcement learning based algorithm for Markov decision processes
    Bhatnagar, S
    Kumar, S
    [J]. 2005 International Conference on Intelligent Sensing and Information Processing, Proceedings, 2005, : 199 - 204
  • [5] A sensitivity view of Markov decision processes and reinforcement learning
    Cao, XR
    [J]. MODELING, CONTROL AND OPTIMIZATION OF COMPLEX SYSTEMS: IN HONOR OF PROFESSOR YU-CHI HO, 2003, 14 : 261 - 283
  • [6] REINFORCEMENT LEARNING OF NON-MARKOV DECISION-PROCESSES
    WHITEHEAD, SD
    LIN, LJ
    [J]. ARTIFICIAL INTELLIGENCE, 1995, 73 (1-2) : 271 - 306
  • [7] From Perturbation Analysis to Markov Decision Processes and Reinforcement Learning
    Xi-Ren Cao
    [J]. Discrete Event Dynamic Systems, 2003, 13 : 9 - 39
  • [8] Reinforcement learning algorithm for partially observable Markov decision processes
    Wang, Xue-Ning
    He, Han-Gen
    Xu, Xin
    [J]. Kongzhi yu Juece/Control and Decision, 2004, 19 (11): : 1263 - 1266
  • [9] From perturbation analysis to Markov decision processes and reinforcement learning
    Cao, XR
    [J]. DISCRETE EVENT DYNAMIC SYSTEMS-THEORY AND APPLICATIONS, 2003, 13 (1-2): : 9 - 39
  • [10] Reinforcement Learning for Cost-Aware Markov Decision Processes
    Suttle, Wesley A.
    Zhang, Kaiqing
    Yang, Zhuoran
    Kraemer, David N.
    Liu, Ji
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139