Non-Stationary Markov Decision Processes a Worst-Case Approach using Model-Based Reinforcement Learning

被引:0
|
作者
Lecarpentier, Erwan [1 ]
Rachelson, Emmanuel [2 ]
机构
[1] Univ Toulouse, ONERA French Aerosp Lab, Toulouse, France
[2] Univ Toulouse, ISAE SUPAERO, Toulouse, France
关键词
POLICY;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This work tackles the problem of robust planning in non-stationary stochastic environments. We study Markov Decision Processes (MDPs) evolving over time and consider Model-Based Reinforcement Learning algorithms in this setting. We make two hypotheses: 1) the environment evolves continuously with a bounded evolution rate; 2) a current model is known at each decision epoch but not its evolution. Our contribution can be presented in four points. 1) we define a specific class of MDPs that we call Non-Stationary MDPs (NSMDPs). We introduce the notion of regular evolution by making an hypothesis of Lipschitz-Continuity on the transition and reward functions w.r.t. time; 2) we consider a planning agent using the current model of the environment but unaware of its future evolution. This leads us to consider a worst-case method where the environment is seen as an adversarial agent; 3) following this approach, we propose the Risk-Averse Tree-Search (RATS) algorithm, a Model-Based method similar to minimax search; 4) we illustrate the benefits brought by RATS empirically and compare its performance with reference Model-Based algorithms.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] Predicting Spectrum Occupancies Using a Non-Stationary Hidden Markov Model
    Chen, Xianfu
    Zhang, Honggang
    MacKenzie, Allen B.
    Matinmikko, Marja
    IEEE WIRELESS COMMUNICATIONS LETTERS, 2014, 3 (04) : 333 - 336
  • [22] RESTARTED BAYESIAN ONLINE CHANGE-POINT DETECTION FOR NON-STATIONARY MARKOV DECISION PROCESSES
    Alami, Reda
    Mahfoud, Mohammed
    Moulines, Eric
    CONFERENCE ON LIFELONG LEARNING AGENTS, VOL 232, 2023, 232 : 715 - 744
  • [24] A Model-free Reinforcement Learning Approach for the Energetic Control of a Building with Non-stationary User Behaviour
    Haddam, Nassim
    Boulakia, Benjamin Cohen
    Barth, Dominique
    2020 THE 4TH INTERNATIONAL CONFERENCE ON SMART GRID AND SMART CITIES (ICSGSC 2020), 2020, : 168 - 177
  • [25] An analysis of model-based Interval Estimation for Markov Decision Processes
    Strehl, Alexander L.
    Littman, Michael L.
    JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2008, 74 (08) : 1309 - 1331
  • [26] Kernel-Based Reinforcement Learning in Robust Markov Decision Processes
    Lim, Shiau Hong
    Autef, Arnaud
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [27] A reinforcement learning based algorithm for finite horizon Markov decision processes
    Bhatnagar, Shalabh
    Abdulla, Mohammed Shahid
    PROCEEDINGS OF THE 45TH IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-14, 2006, : 5519 - 5524
  • [28] Model-based fault diagnosis of induction motors using non-stationary signal segmentation
    Kim, K
    Parlos, AG
    MECHANICAL SYSTEMS AND SIGNAL PROCESSING, 2002, 16 (2-3) : 223 - 253
  • [29] Reinforcement learning based algorithms for average cost Markov Decision Processes
    Abdulla, Mohammed Shahid
    Bhatnagar, Shalabh
    DISCRETE EVENT DYNAMIC SYSTEMS-THEORY AND APPLICATIONS, 2007, 17 (01): : 23 - 52
  • [30] Reinforcement Learning Based Algorithms for Average Cost Markov Decision Processes
    Mohammed Shahid Abdulla
    Shalabh Bhatnagar
    Discrete Event Dynamic Systems, 2007, 17 : 23 - 52