Exponential Bellman Equation and Improved Regret Bounds for Risk-Sensitive Reinforcement Learning

被引:0
|
作者
Fei, Yingjie [1 ]
Yang, Zhuoran [2 ]
Chen, Yudong [3 ]
Wang, Zhaoran [1 ]
机构
[1] Northwestern Univ, Evanston, IL 60208 USA
[2] Princeton Univ, Princeton, NJ USA
[3] Univ Wisconsin Madison, Madison, WI USA
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021) | 2021年 / 34卷
基金
美国国家科学基金会;
关键词
TIME MARKOV-PROCESSES; INFINITE-HORIZON RISK; DECISION-PROCESSES;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study risk-sensitive reinforcement learning (RL) based on the entropic risk measure. Although existing works have established non-asymptotic regret guarantees for this problem, they leave open an exponential gap between the upper and lower bounds. We identify the deficiencies in existing algorithms and their analysis that result in such a gap. To remedy these deficiencies, we investigate a simple transformation of the risk-sensitive Bellman equations, which we call the exponential Bellman equation. The exponential Bellman equation inspires us to develop a novel analysis of Bellman backup procedures in risk-sensitive RL algorithms, and further motivates the design of a novel exploration mechanism. We show that these analytic and algorithmic innovations together lead to improved regret upper bounds over existing ones.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Regret Bounds for Risk-Sensitive Reinforcement Learning
    Bastani, Osbert
    Ma, Yecheng Jason
    Shen, Estelle
    Xu, Wanqiao
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [2] Bridging Distributional and Risk-sensitive Reinforcement Learning with Provable Regret Bounds
    Liang, Hao
    Luo, Zhi-Quan
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
  • [3] Regret Bounds for Risk-sensitive Reinforcement Learning with Lipschitz Dynamic Risk Measures
    Liang, Hao
    Luo, Zhi-Quan
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [4] Cascaded Gaps: Towards Logarithmic Regret for Risk-Sensitive Reinforcement Learning
    Fei, Yingjie
    Xu, Ruitu
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [5] Learning Bounds for Risk-sensitive Learning
    Lee, Jaeho
    Park, Sejun
    Shin, Jinwoo
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [6] Risk-Sensitive Reinforcement Learning
    Shen, Yun
    Tobia, Michael J.
    Sommer, Tobias
    Obermayer, Klaus
    NEURAL COMPUTATION, 2014, 26 (07) : 1298 - 1328
  • [7] Risk-sensitive reinforcement learning
    Mihatsch, O
    Neuneier, R
    MACHINE LEARNING, 2002, 49 (2-3) : 267 - 290
  • [8] Risk-Sensitive Reinforcement Learning
    Oliver Mihatsch
    Ralph Neuneier
    Machine Learning, 2002, 49 : 267 - 290
  • [9] A Tighter Problem-Dependent Regret Bound for Risk-Sensitive Reinforcement Learning
    Hu, Xiaoyan
    Leung, Ho-Fung
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206
  • [10] On tight bounds for function approximation error in risk-sensitive reinforcement learning
    Karmakar, Prasenjit
    Bhatnagar, Shalabh
    SYSTEMS & CONTROL LETTERS, 2021, 150