Algorithms for CVaR Optimization in MDPs

被引:0
|
作者
Chow, Yinlam [1 ,2 ]
Ghavamzadeh, Mohammad [2 ,3 ]
机构
[1] Stanford Univ, Inst Computat & Math Engn, Stanford, CA 94305 USA
[2] Adobe Res, San Jose, CA USA
[3] INRIA Lille, Team SequeL, Lille, France
关键词
VALUE-AT-RISK; STOCHASTIC-APPROXIMATION; DECISION; COST;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In many sequential decision-making problems we may want to manage risk by minimizing some measure of variability in costs in addition to minimizing a standard criterion. Conditional value-at-risk (CVaR) is a relatively new risk measure that addresses some of the shortcomings of the well-known variance-related risk measures, and because of its computational efficiencies has gained popularity in finance and operations research. In this paper, we consider the mean-CVaR optimization problem in MDPs. We first derive a formula for computing the gradient of this risk-sensitive objective function. We then devise policy gradient and actor-critic algorithms that each uses a specific method to estimate this gradient and updates the policy parameters in the descent direction. We establish the convergence of our algorithms to locally risk-sensitive optimal policies. Finally, we demonstrate the usefulness of our algorithms in an optimal stopping problem.
引用
收藏
页数:9
相关论文
共 50 条
  • [41] Variance-constrained actor-critic algorithms for discounted and average reward MDPs
    Prashanth, L. A.
    Ghavamzadeh, Mohammad
    MACHINE LEARNING, 2016, 105 (03) : 367 - 417
  • [42] Robust optimization of mixed CVaR STARR ratio using copulas
    Goel, Anubha
    Sharma, Amita
    Mehra, Aparna
    JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS, 2019, 347 : 62 - 83
  • [43] Portfolio optimization with VaR and CVaR: the case of Gold and Euro portfolio
    Malek, Jiri
    Quang Van Tran
    37TH INTERNATIONAL CONFERENCE ON MATHEMATICAL METHODS IN ECONOMICS 2019, 2019, : 156 - 161
  • [44] Fast gradient descent method for Mean-CVaR optimization
    Garud Iyengar
    Alfred Ka Chun Ma
    Annals of Operations Research, 2013, 205 : 203 - 212
  • [45] Variance-constrained actor-critic algorithms for discounted and average reward MDPs
    L. A. Prashanth
    Mohammad Ghavamzadeh
    Machine Learning, 2016, 105 : 367 - 417
  • [46] Algorithms for handling CVaR constraints in dynamic stochastic programming models with applications to finance
    Fabian, Csaba I.
    Veszpremi, Anna
    JOURNAL OF RISK, 2008, 10 (03): : 111 - 131
  • [47] TESTING THE CVAR IN THE FRACTIONAL CVAR MODEL
    Johansen, Soren
    Nielsen, Morten Orregaard
    JOURNAL OF TIME SERIES ANALYSIS, 2018, 39 (06) : 836 - 849
  • [48] Algorithms for optimization
    Peel C.
    Moon T.K.
    IEEE Control Systems, 2020, 40 (02) : 92 - 94
  • [49] Optimization algorithms
    Csiszar, Sandor
    2007 INTERNATIONAL SYMPOSIUM ON LOGISTICS AND INDUSTRIAL INFORMATICS, 2007, : 166 - 169
  • [50] Cross Entropy Optimization of Action Modification Policies for Continuous-Valued MDPs
    Mirkamali, Kamelia
    Busoniu, Lucian
    IFAC PAPERSONLINE, 2020, 53 (02): : 8124 - 8129