Non-Stationary Markov Decision Processes a Worst-Case Approach using Model-Based Reinforcement Learning

被引：0

作者：

Lecarpentier, Erwan ^{[1
]}

Rachelson, Emmanuel ^{[2
]}

机构：

[1] Univ Toulouse, ONERA French Aerosp Lab, Toulouse, France

[2] Univ Toulouse, ISAE SUPAERO, Toulouse, France

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019) | 2019年 / 32卷

关键词：

POLICY;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This work tackles the problem of robust planning in non-stationary stochastic environments. We study Markov Decision Processes (MDPs) evolving over time and consider Model-Based Reinforcement Learning algorithms in this setting. We make two hypotheses: 1) the environment evolves continuously with a bounded evolution rate; 2) a current model is known at each decision epoch but not its evolution. Our contribution can be presented in four points. 1) we define a specific class of MDPs that we call Non-Stationary MDPs (NSMDPs). We introduce the notion of regular evolution by making an hypothesis of Lipschitz-Continuity on the transition and reward functions w.r.t. time; 2) we consider a planning agent using the current model of the environment but unaware of its future evolution. This leads us to consider a worst-case method where the environment is seen as an adversarial agent; 3) following this approach, we propose the Risk-Averse Tree-Search (RATS) algorithm, a Model-Based method similar to minimax search; 4) we illustrate the benefits brought by RATS empirically and compare its performance with reference Model-Based algorithms.

引用

页数：10

共 50 条

[1] Reinforcement Learning for Non-Stationary Markov Decision Processes: The Blessing of (More) Optimism
Cheung, Wang Chi
Simchi-Levi, David
Zhu, Ruihao
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
[2] Reinforcement Learning for Non-Stationary Markov Decision Processes: The Blessing of (More) Optimism
Cheung, Wang Chi
Simchi-Levi, David
Zhu, Ruihao
25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019), 2019,
[3] Model-based Reinforcement Learning for Semi-Markov Decision Processes with Neural ODEs
Du, Jianzhun
Futoma, Joseph
Doshi-Velez, Finale
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[4] An LPV control approach to asymptotic rejection of non-stationary disturbances with guaranteed worst-case performance
Koroglu, Hakan
Scherer, Carsten W.
2007 AMERICAN CONTROL CONFERENCE, VOLS 1-13, 2007, : 13 - 18
[5] Answer set programming for non-stationary Markov decision processes
Leonardo A. Ferreira
Reinaldo A. C. Bianchi
Paulo E. Santos
Ramon Lopez de Mantaras
Applied Intelligence, 2017, 47 : 993 - 1007
[6] Answer set programming for non-stationary Markov decision processes
Ferreira, Leonardo A.
Bianchi, Reinaldo A. C.
Santos, Paulo E.
Lopez de Mantaras, Ramon
APPLIED INTELLIGENCE, 2017, 47 (04) : 993 - 1007
[7] Model-Based Reinforcement Learning for Infinite-Horizon Discounted Constrained Markov Decision Processes
HasanzadeZonuzy, Aria
Kalathil, Dileep
Shakkottai, Srinivas
PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 2519 - 2525
[8] Model-based noise suppression using unsupervised estimation of hidden Markov model for non-stationary noise
Fujimoto, Masakiyo
Nakatani, Tomohiro
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2981 - 2985
[9] Model-based Bayesian Reinforcement Learning in Factored Markov Decision Process
Wu, Bo
Feng, Yanpeng
Zheng, Hongyan
JOURNAL OF COMPUTERS, 2014, 9 (04) : 845 - 850
[10] Set-based value operators for non-stationary and uncertain Markov decision processes
Li, Sarah H.Q.
Adjé, Assalé
Garoche, Pierre-Loïc
Açıkmeşe, Behçet
Automatica, 2025, 171

← 1 2 3 4 5 →