Algorithms for CVaR Optimization in MDPs

被引:0
|
作者
Chow, Yinlam [1 ,2 ]
Ghavamzadeh, Mohammad [2 ,3 ]
机构
[1] Stanford Univ, Inst Computat & Math Engn, Stanford, CA 94305 USA
[2] Adobe Res, San Jose, CA USA
[3] INRIA Lille, Team SequeL, Lille, France
关键词
VALUE-AT-RISK; STOCHASTIC-APPROXIMATION; DECISION; COST;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In many sequential decision-making problems we may want to manage risk by minimizing some measure of variability in costs in addition to minimizing a standard criterion. Conditional value-at-risk (CVaR) is a relatively new risk measure that addresses some of the shortcomings of the well-known variance-related risk measures, and because of its computational efficiencies has gained popularity in finance and operations research. In this paper, we consider the mean-CVaR optimization problem in MDPs. We first derive a formula for computing the gradient of this risk-sensitive objective function. We then devise policy gradient and actor-critic algorithms that each uses a specific method to estimate this gradient and updates the policy parameters in the descent direction. We establish the convergence of our algorithms to locally risk-sensitive optimal policies. Finally, we demonstrate the usefulness of our algorithms in an optimal stopping problem.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] CVaR Optimization for MDPs: Existence and Computation of Optimal Policies
    Ding R.
    Feinberg E.
    Performance Evaluation Review, 2022, 50 (02): : 39 - 41
  • [2] Policy Gradients for CVaR-Constrained MDPs
    Prashanth, L. A.
    Algorithmic Learning Theory (ALT 2014), 2014, 8776 : 155 - 169
  • [3] Anytime Algorithms for Solving Possibilistic MDPs and Hybrid MDPs
    Bauters, Kim
    Liu, Weiru
    Godo, Lluis
    FOUNDATIONS OF INFORMATION AND KNOWLEDGE SYSTEMS (FOIKS 2016), 2016, 9616 : 24 - 41
  • [4] Flexible Charging Optimization for Electric Vehicles using MDPs-based Online Algorithms
    Tomin, Nikita, V
    Maasmann, Jonas
    Domyshev, Alexandr B.
    IFAC PAPERSONLINE, 2020, 53 (02): : 12614 - 12619
  • [5] Near-optimal Policy Optimization Algorithms for Learning Adversarial Linear Mixture MDPs
    He, Jiafan
    Zhou, Dongruo
    Gu, Quanquan
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
  • [6] Efficient solution algorithms for factored MDPs
    Guestrin, C
    Koller, D
    Parr, R
    Venkataraman, S
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2003, 19 : 399 - 468
  • [7] Cutting plane algorithms for mean-CVaR portfolio optimization with nonconvex transaction costs
    Takano Y.
    Nanjo K.
    Sukegawa N.
    Mizuno S.
    Computational Management Science, 2015, 12 (2) : 319 - 340
  • [8] Algorithms for Branching MDPs and Branching Stochastic Games
    Etessami, Kousha
    ELECTRONIC PROCEEDINGS IN THEORETICAL COMPUTER SCIENCE, 2016, (220):
  • [9] A Method for Solving a CVaR Optimization
    Zhang, Maojun
    Nan, Jiangxia
    2008 4TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND MOBILE COMPUTING, VOLS 1-31, 2008, : 10139 - +
  • [10] CVaR norm and applications in optimization
    Pavlikov, Konstantin
    Uryasev, Stan
    OPTIMIZATION LETTERS, 2014, 8 (07) : 1999 - 2020