Exponential Bellman Equation and Improved Regret Bounds for Risk-Sensitive Reinforcement Learning

被引：0

作者：

Fei, Yingjie ^{[1
]}

Yang, Zhuoran ^{[2
]}

Chen, Yudong ^{[3
]}

Wang, Zhaoran ^{[1
]}

机构：

[1] Northwestern Univ, Evanston, IL 60208 USA

[2] Princeton Univ, Princeton, NJ USA

[3] Univ Wisconsin Madison, Madison, WI USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021) | 2021年 / 34卷

基金：

美国国家科学基金会;

关键词：

TIME MARKOV-PROCESSES; INFINITE-HORIZON RISK; DECISION-PROCESSES;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We study risk-sensitive reinforcement learning (RL) based on the entropic risk measure. Although existing works have established non-asymptotic regret guarantees for this problem, they leave open an exponential gap between the upper and lower bounds. We identify the deficiencies in existing algorithms and their analysis that result in such a gap. To remedy these deficiencies, we investigate a simple transformation of the risk-sensitive Bellman equations, which we call the exponential Bellman equation. The exponential Bellman equation inspires us to develop a novel analysis of Bellman backup procedures in risk-sensitive RL algorithms, and further motivates the design of a novel exploration mechanism. We show that these analytic and algorithmic innovations together lead to improved regret upper bounds over existing ones.

引用

页数：11

共 50 条

[1] Regret Bounds for Risk-Sensitive Reinforcement Learning
Bastani, Osbert
Ma, Yecheng Jason
Shen, Estelle
Xu, Wanqiao
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[2] Bridging Distributional and Risk-sensitive Reinforcement Learning with Provable Regret Bounds
Liang, Hao
Luo, Zhi-Quan
JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
[3] Regret Bounds for Risk-sensitive Reinforcement Learning with Lipschitz Dynamic Risk Measures
Liang, Hao
Luo, Zhi-Quan
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
[4] Cascaded Gaps: Towards Logarithmic Regret for Risk-Sensitive Reinforcement Learning
Fei, Yingjie
Xu, Ruitu
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[5] Learning Bounds for Risk-sensitive Learning
Lee, Jaeho
Park, Sejun
Shin, Jinwoo
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[6] Risk-Sensitive Reinforcement Learning
Shen, Yun
Tobia, Michael J.
Sommer, Tobias
Obermayer, Klaus
NEURAL COMPUTATION, 2014, 26 (07) : 1298 - 1328
[7] Risk-sensitive reinforcement learning
Mihatsch, O
Neuneier, R
MACHINE LEARNING, 2002, 49 (2-3) : 267 - 290
[8] Risk-Sensitive Reinforcement Learning
Oliver Mihatsch
Ralph Neuneier
Machine Learning, 2002, 49 : 267 - 290
[9] A Tighter Problem-Dependent Regret Bound for Risk-Sensitive Reinforcement Learning
Hu, Xiaoyan
Leung, Ho-Fung
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206
[10] On tight bounds for function approximation error in risk-sensitive reinforcement learning
Karmakar, Prasenjit
Bhatnagar, Shalabh
SYSTEMS & CONTROL LETTERS, 2021, 150

← 1 2 3 4 5 →