Regret Bounds for Risk-sensitive Reinforcement Learning with Lipschitz Dynamic Risk Measures

被引：0

作者：

Liang, Hao ^{[1
]}

Luo, Zhi-Quan ^{[1
]}

机构：

[1] Chinese Univ Hong Kong, Shenzhen, Hong Kong, Peoples R China

来源：

INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238 | 2024年 / 238卷

关键词：

COHERENCE;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We study finite episodic Markov decision processes incorporating dynamic risk measures to capture risk sensitivity. To this end, we present two model-based algorithms applied to Lipschitz dynamic risk measures, a wide range of risk measures that subsumes spectral risk measure, optimized certainty equivalent, and distortion risk measures, among others. We establish both regret upper bounds and lower bounds. Notably, our upper bounds demonstrate optimal dependencies on the number of actions and episodes while reflecting the inherent trade-off between risk sensitivity and sample complexity. Our approach offers a unified framework that not only encompasses multiple existing formulations in the literature but also broadens the application spectrum.

引用

页数：32

共 50 条

[1] Regret Bounds for Risk-Sensitive Reinforcement Learning
Bastani, Osbert
Ma, Yecheng Jason
Shen, Estelle
Xu, Wanqiao
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[2] Bridging Distributional and Risk-sensitive Reinforcement Learning with Provable Regret Bounds
Liang, Hao
Luo, Zhi-Quan
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
[3] Exponential Bellman Equation and Improved Regret Bounds for Risk-Sensitive Reinforcement Learning
Fei, Yingjie
Yang, Zhuoran
Chen, Yudong
Wang, Zhaoran
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[4] Cascaded Gaps: Towards Logarithmic Regret for Risk-Sensitive Reinforcement Learning
Fei, Yingjie
Xu, Ruitu
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[5] Risk-Sensitive Reinforcement Learning
Shen, Yun
Tobia, Michael J.
Sommer, Tobias
Obermayer, Klaus
[J]. NEURAL COMPUTATION, 2014, 26 (07) : 1298 - 1328
[6] Learning Bounds for Risk-sensitive Learning
Lee, Jaeho
Park, Sejun
Shin, Jinwoo
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[7] Risk-sensitive reinforcement learning
Mihatsch, O
Neuneier, R
[J]. MACHINE LEARNING, 2002, 49 (2-3) : 267 - 290
[8] Risk-Sensitive Reinforcement Learning
Oliver Mihatsch
Ralph Neuneier
[J]. Machine Learning, 2002, 49 : 267 - 290
[9] Risk-Sensitive Reinforcement Learning: Near-Optimal Risk-Sample Tradeoff in Regret
Fei, Yingjie
Yang, Zhuoran
Chen, Yudong
Wang, Zhaoran
Xie, Qiaomin
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[10] On tight bounds for function approximation error in risk-sensitive reinforcement learning
Karmakar, Prasenjit
Bhatnagar, Shalabh
[J]. SYSTEMS & CONTROL LETTERS, 2021, 150

← 1 2 3 4 5 →