Non-Stationary Markov Decision Processes a Worst-Case Approach using Model-Based Reinforcement Learning

被引:0
|
作者
Lecarpentier, Erwan [1 ]
Rachelson, Emmanuel [2 ]
机构
[1] Univ Toulouse, ONERA French Aerosp Lab, Toulouse, France
[2] Univ Toulouse, ISAE SUPAERO, Toulouse, France
关键词
POLICY;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This work tackles the problem of robust planning in non-stationary stochastic environments. We study Markov Decision Processes (MDPs) evolving over time and consider Model-Based Reinforcement Learning algorithms in this setting. We make two hypotheses: 1) the environment evolves continuously with a bounded evolution rate; 2) a current model is known at each decision epoch but not its evolution. Our contribution can be presented in four points. 1) we define a specific class of MDPs that we call Non-Stationary MDPs (NSMDPs). We introduce the notion of regular evolution by making an hypothesis of Lipschitz-Continuity on the transition and reward functions w.r.t. time; 2) we consider a planning agent using the current model of the environment but unaware of its future evolution. This leads us to consider a worst-case method where the environment is seen as an adversarial agent; 3) following this approach, we propose the Risk-Averse Tree-Search (RATS) algorithm, a Model-Based method similar to minimax search; 4) we illustrate the benefits brought by RATS empirically and compare its performance with reference Model-Based algorithms.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Reinforcement Learning for Non-Stationary Markov Decision Processes: The Blessing of (More) Optimism
    Cheung, Wang Chi
    Simchi-Levi, David
    Zhu, Ruihao
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [2] Reinforcement Learning for Non-Stationary Markov Decision Processes: The Blessing of (More) Optimism
    Cheung, Wang Chi
    Simchi-Levi, David
    Zhu, Ruihao
    25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019), 2019,
  • [3] Model-based Reinforcement Learning for Semi-Markov Decision Processes with Neural ODEs
    Du, Jianzhun
    Futoma, Joseph
    Doshi-Velez, Finale
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [4] An LPV control approach to asymptotic rejection of non-stationary disturbances with guaranteed worst-case performance
    Koroglu, Hakan
    Scherer, Carsten W.
    2007 AMERICAN CONTROL CONFERENCE, VOLS 1-13, 2007, : 13 - 18
  • [5] Answer set programming for non-stationary Markov decision processes
    Leonardo A. Ferreira
    Reinaldo A. C. Bianchi
    Paulo E. Santos
    Ramon Lopez de Mantaras
    Applied Intelligence, 2017, 47 : 993 - 1007
  • [6] Answer set programming for non-stationary Markov decision processes
    Ferreira, Leonardo A.
    Bianchi, Reinaldo A. C.
    Santos, Paulo E.
    Lopez de Mantaras, Ramon
    APPLIED INTELLIGENCE, 2017, 47 (04) : 993 - 1007
  • [7] Model-Based Reinforcement Learning for Infinite-Horizon Discounted Constrained Markov Decision Processes
    HasanzadeZonuzy, Aria
    Kalathil, Dileep
    Shakkottai, Srinivas
    PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 2519 - 2525
  • [8] Model-based noise suppression using unsupervised estimation of hidden Markov model for non-stationary noise
    Fujimoto, Masakiyo
    Nakatani, Tomohiro
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2981 - 2985
  • [9] Model-based Bayesian Reinforcement Learning in Factored Markov Decision Process
    Wu, Bo
    Feng, Yanpeng
    Zheng, Hongyan
    JOURNAL OF COMPUTERS, 2014, 9 (04) : 845 - 850
  • [10] Set-based value operators for non-stationary and uncertain Markov decision processes
    Li, Sarah H.Q.
    Adjé, Assalé
    Garoche, Pierre-Loïc
    Açıkmeşe, Behçet
    Automatica, 2025, 171