Algorithms for CVaR Optimization in MDPs

被引：0

作者：

Chow, Yinlam ^{[1
,2
]}

Ghavamzadeh, Mohammad ^{[2
,3
]}

机构：

[1] Stanford Univ, Inst Computat & Math Engn, Stanford, CA 94305 USA

[2] Adobe Res, San Jose, CA USA

[3] INRIA Lille, Team SequeL, Lille, France

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014) | 2014年 / 27卷

关键词：

VALUE-AT-RISK; STOCHASTIC-APPROXIMATION; DECISION; COST;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In many sequential decision-making problems we may want to manage risk by minimizing some measure of variability in costs in addition to minimizing a standard criterion. Conditional value-at-risk (CVaR) is a relatively new risk measure that addresses some of the shortcomings of the well-known variance-related risk measures, and because of its computational efficiencies has gained popularity in finance and operations research. In this paper, we consider the mean-CVaR optimization problem in MDPs. We first derive a formula for computing the gradient of this risk-sensitive objective function. We then devise policy gradient and actor-critic algorithms that each uses a specific method to estimate this gradient and updates the policy parameters in the descent direction. We establish the convergence of our algorithms to locally risk-sensitive optimal policies. Finally, we demonstrate the usefulness of our algorithms in an optimal stopping problem.

引用

页数：9

共 50 条

[1] CVaR Optimization for MDPs: Existence and Computation of Optimal Policies
Ding R.
Feinberg E.
Performance Evaluation Review, 2022, 50 (02): : 39 - 41
[2] Policy Gradients for CVaR-Constrained MDPs
Prashanth, L. A.
Algorithmic Learning Theory (ALT 2014), 2014, 8776 : 155 - 169
[3] Anytime Algorithms for Solving Possibilistic MDPs and Hybrid MDPs
Bauters, Kim
Liu, Weiru
Godo, Lluis
FOUNDATIONS OF INFORMATION AND KNOWLEDGE SYSTEMS (FOIKS 2016), 2016, 9616 : 24 - 41
[4] Flexible Charging Optimization for Electric Vehicles using MDPs-based Online Algorithms
Tomin, Nikita, V
Maasmann, Jonas
Domyshev, Alexandr B.
IFAC PAPERSONLINE, 2020, 53 (02): : 12614 - 12619
[5] Near-optimal Policy Optimization Algorithms for Learning Adversarial Linear Mixture MDPs
He, Jiafan
Zhou, Dongruo
Gu, Quanquan
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
[6] Efficient solution algorithms for factored MDPs
Guestrin, C
Koller, D
Parr, R
Venkataraman, S
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2003, 19 : 399 - 468
[7] Cutting plane algorithms for mean-CVaR portfolio optimization with nonconvex transaction costs
Takano Y.
Nanjo K.
Sukegawa N.
Mizuno S.
Computational Management Science, 2015, 12 (2) : 319 - 340
[8] Algorithms for Branching MDPs and Branching Stochastic Games
Etessami, Kousha
ELECTRONIC PROCEEDINGS IN THEORETICAL COMPUTER SCIENCE, 2016, (220):
[9] A Method for Solving a CVaR Optimization
Zhang, Maojun
Nan, Jiangxia
2008 4TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND MOBILE COMPUTING, VOLS 1-31, 2008, : 10139 - +
[10] CVaR norm and applications in optimization
Pavlikov, Konstantin
Uryasev, Stan
OPTIMIZATION LETTERS, 2014, 8 (07) : 1999 - 2020

← 1 2 3 4 5 →