Offline reinforcement learning in high-dimensional stochastic environments

被引:1
|
作者
Heche, Felicien [1 ,2 ]
Barakat, Oussama [2 ]
Desmettre, Thibaut [4 ]
Marx, Tania [3 ]
Robert-Nicoud, Stephan [1 ]
机构
[1] Univ Appl Sci & Arts Western Switzerland, Sch Engn & Management, Yverdon, Switzerland
[2] Univ Franche Comte, Nanomed Lab Imagery & Therapeut, F-25000 Besancon, France
[3] Ctr Hospitalier Univ, Emergency Dept, Besancon, France
[4] Hop Univ Geneve, Emergency Dept, Geneva, Switzerland
来源
NEURAL COMPUTING & APPLICATIONS | 2023年 / 36卷 / 2期
关键词
Offline RL; Risk-averse RL; High-dimensional RL; Distributional RL; RISK; REPRESENTATION; GO;
D O I
10.1007/s00521-023-09029-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Offline reinforcement learning (RL) has emerged as a promising paradigm for real-world applications since it aims to train policies directly from datasets of past interactions with the environment. The past few years, algorithms have been introduced to learn from high-dimensional observational states in offline settings. The general idea of these methods is to encode the environment into a latent space and train policies on top of this smaller representation. In this paper, we extend this general method to stochastic environments (i.e., where the reward function is stochastic) and consider a risk measure instead of the classical expected return. First, we show that, under some assumptions, it is equivalent to minimizing a risk measure in the latent space and in the natural space. Based on this result, we present Latent Offline Distributional Actor-Critic (LODAC), an algorithm which is able to train policies in high-dimensional stochastic and offline settings to minimize a given risk measure. Empirically, we show that using LODAC to minimize Conditional Value-at-Risk (CVaR) outperforms previous methods in terms of CVaR and return on stochastic environments.
引用
收藏
页码:585 / 598
页数:14
相关论文
共 50 条
  • [1] Offline reinforcement learning in high-dimensional stochastic environments
    Félicien Hêche
    Oussama Barakat
    Thibaut Desmettre
    Tania Marx
    Stephan Robert-Nicoud
    Neural Computing and Applications, 2024, 36 : 585 - 598
  • [2] Optimizing high-dimensional stochastic forestry via reinforcement learning
    Tahvonen, Olli
    Suominen, Antti
    Malo, Pekka
    Viitasaari, Lauri
    Parkatti, Vesa-Pekka
    JOURNAL OF ECONOMIC DYNAMICS & CONTROL, 2022, 145
  • [3] High-dimensional Function Optimisation by Reinforcement Learning
    Wu, Q. H.
    Liao, H. L.
    2010 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2010,
  • [4] Reinforcement learning for high-dimensional problems with symmetrical actions
    Kamal, MAS
    Murata, J
    2004 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN & CYBERNETICS, VOLS 1-7, 2004, : 6192 - 6197
  • [5] Challenges in High-Dimensional Reinforcement Learning with Evolution Strategies
    Mueller, Nils
    Glasmachers, Tobias
    PARALLEL PROBLEM SOLVING FROM NATURE - PPSN XV, PT II, 2018, 11102 : 411 - 423
  • [6] Emergent Solutions to High-Dimensional Multitask Reinforcement Learning
    Kelly, Stephen
    Heywood, Malcolm, I
    EVOLUTIONARY COMPUTATION, 2018, 26 (03) : 347 - 380
  • [7] Machine learning for high-dimensional dynamic stochastic economies
    Scheidegger, Simon
    Bilionis, Ilias
    JOURNAL OF COMPUTATIONAL SCIENCE, 2019, 33 : 68 - 82
  • [8] High-Dimensional Stock Portfolio Trading with Deep Reinforcement Learning
    Pigorsch, Uta
    Schaefer, Sebastian
    2022 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE FOR FINANCIAL ENGINEERING AND ECONOMICS (CIFER), 2022,
  • [9] Multiagent reinforcement learning with the partly high-dimensional state space
    Department of Electrical and Computer Engineering, Nagoya Institute of Technology, Nagoya, 466-8555, Japan
    Syst Comput Jpn, 2006, 9 (22-31):
  • [10] A Deep Reinforcement Learning Framework for High-Dimensional Circuit Linearization
    Rong, Chao
    Paramesh, Jeyanandh
    Carley, L. Richard
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2022, 69 (09) : 3665 - 3669