Risk-Sensitive Policy with Distributional Reinforcement Learning

被引:2
|
作者
Theate, Thibaut [1 ]
Ernst, Damien [1 ,2 ]
机构
[1] Univ Liege, Dept Elect Engn & Comp Sci, B-4031 Liege, Belgium
[2] Inst Polytech Paris, Informat Proc & Commun Lab, F-91120 Paris, France
关键词
distributional reinforcement learning; sequential decision-making; risk-sensitive policy; risk management; deep neural network;
D O I
10.3390/a16070325
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Classical reinforcement learning (RL) techniques are generally concerned with the design of decision-making policies driven by the maximisation of the expected outcome. Nevertheless, this approach does not take into consideration the potential risk associated with the actions taken, which may be critical in certain applications. To address that issue, the present research work introduces a novel methodology based on distributional RL to derive sequential decision-making policies that are sensitive to the risk, the latter being modelled by the tail of the return probability distribution. The core idea is to replace the Q function generally standing at the core of learning schemes in RL by another function, taking into account both the expected return and the risk. Named the risk-based utility function U, it can be extracted from the random return distribution Z naturally learnt by any distributional RL algorithm. This enables the spanning of the complete potential trade-off between risk minimisation and expected return maximisation, in contrast to fully risk-averse methodologies. Fundamentally, this research yields a truly practical and accessible solution for learning risk-sensitive policies with minimal modification to the distributional RL algorithm, with an emphasis on the interpretability of the resulting decision-making process.
引用
收藏
页数:16
相关论文
共 50 条
  • [31] Cascaded Gaps: Towards Logarithmic Regret for Risk-Sensitive Reinforcement Learning
    Fei, Yingjie
    Xu, Ruitu
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [32] On tight bounds for function approximation error in risk-sensitive reinforcement learning
    Karmakar, Prasenjit
    Bhatnagar, Shalabh
    SYSTEMS & CONTROL LETTERS, 2021, 150
  • [33] A Reinforcement Learning Look at Risk-Sensitive Linear Quadratic Gaussian Control
    Cui, Leilei
    Basar, Tamer
    Jiang, Zhong-Ping
    LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 211, 2023, 211
  • [34] Risk-Sensitive Reinforcement Learning Via Entropic-VaR Optimization
    Ni, Xinyi
    Lai, Lifeng
    2022 56TH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS, AND COMPUTERS, 2022, : 953 - 959
  • [35] Learning Bounds for Risk-sensitive Learning
    Lee, Jaeho
    Park, Sejun
    Shin, Jinwoo
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [36] Risk-sensitive online learning
    Even-Dar, Eyal
    Kearns, Michael
    Wortman, Jennifer
    ALGORITHMIC LEARNING THEORY, PROCEEDINGS, 2006, 4264 : 199 - 213
  • [37] Exponential TD Learning: A Risk-Sensitive Actor-Critic Reinforcement Learning Algorithm
    Noorani, Erfaun
    Mavridis, Christos N.
    Baras, John S.
    2023 AMERICAN CONTROL CONFERENCE, ACC, 2023, : 4104 - 4109
  • [38] RISK-SENSITIVE OPTIMAL INVESTMENT POLICY
    LEFEBVRE, M
    MONTULET, P
    INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE, 1994, 25 (01) : 183 - 192
  • [39] A Risk-Sensitive Approach to Policy Optimization
    Markowitz, Jared
    Gardner, Ryan W.
    Llorens, Ashley
    Arora, Raman
    Wang, I-Jeng
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 12, 2023, : 15019 - 15027
  • [40] Exponential Bellman Equation and Improved Regret Bounds for Risk-Sensitive Reinforcement Learning
    Fei, Yingjie
    Yang, Zhuoran
    Chen, Yudong
    Wang, Zhaoran
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34