Algorithms for CVaR Optimization in MDPs

被引：0

作者：

Chow, Yinlam ^{[1
,2
]}

Ghavamzadeh, Mohammad ^{[2
,3
]}

机构：

[1] Stanford Univ, Inst Computat & Math Engn, Stanford, CA 94305 USA

[2] Adobe Res, San Jose, CA USA

[3] INRIA Lille, Team SequeL, Lille, France

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014) | 2014年 / 27卷

关键词：

VALUE-AT-RISK; STOCHASTIC-APPROXIMATION; DECISION; COST;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In many sequential decision-making problems we may want to manage risk by minimizing some measure of variability in costs in addition to minimizing a standard criterion. Conditional value-at-risk (CVaR) is a relatively new risk measure that addresses some of the shortcomings of the well-known variance-related risk measures, and because of its computational efficiencies has gained popularity in finance and operations research. In this paper, we consider the mean-CVaR optimization problem in MDPs. We first derive a formula for computing the gradient of this risk-sensitive objective function. We then devise policy gradient and actor-critic algorithms that each uses a specific method to estimate this gradient and updates the policy parameters in the descent direction. We establish the convergence of our algorithms to locally risk-sensitive optimal policies. Finally, we demonstrate the usefulness of our algorithms in an optimal stopping problem.

引用

页数：9

共 50 条

[41] Variance-constrained actor-critic algorithms for discounted and average reward MDPs
Prashanth, L. A.
Ghavamzadeh, Mohammad
MACHINE LEARNING, 2016, 105 (03) : 367 - 417
[42] Robust optimization of mixed CVaR STARR ratio using copulas
Goel, Anubha
Sharma, Amita
Mehra, Aparna
JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS, 2019, 347 : 62 - 83
[43] Portfolio optimization with VaR and CVaR: the case of Gold and Euro portfolio
Malek, Jiri
Quang Van Tran
37TH INTERNATIONAL CONFERENCE ON MATHEMATICAL METHODS IN ECONOMICS 2019, 2019, : 156 - 161
[44] Fast gradient descent method for Mean-CVaR optimization
Garud Iyengar
Alfred Ka Chun Ma
Annals of Operations Research, 2013, 205 : 203 - 212
[45] Variance-constrained actor-critic algorithms for discounted and average reward MDPs
L. A. Prashanth
Mohammad Ghavamzadeh
Machine Learning, 2016, 105 : 367 - 417
[46] Algorithms for handling CVaR constraints in dynamic stochastic programming models with applications to finance
Fabian, Csaba I.
Veszpremi, Anna
JOURNAL OF RISK, 2008, 10 (03): : 111 - 131
[47] TESTING THE CVAR IN THE FRACTIONAL CVAR MODEL
Johansen, Soren
Nielsen, Morten Orregaard
JOURNAL OF TIME SERIES ANALYSIS, 2018, 39 (06) : 836 - 849
[48] Algorithms for optimization
Peel C.
Moon T.K.
IEEE Control Systems, 2020, 40 (02) : 92 - 94
[49] Optimization algorithms
Csiszar, Sandor
2007 INTERNATIONAL SYMPOSIUM ON LOGISTICS AND INDUSTRIAL INFORMATICS, 2007, : 166 - 169
[50] Cross Entropy Optimization of Action Modification Policies for Continuous-Valued MDPs
Mirkamali, Kamelia
Busoniu, Lucian
IFAC PAPERSONLINE, 2020, 53 (02): : 8124 - 8129

← 1 2 3 4 5 →