A unified algorithm framework for mean-variance optimization in discounted Markov decision processes

被引:1
|
作者
Ma, Shuai [1 ]
Ma, Xiaoteng [2 ]
Xia, Li [1 ]
机构
[1] Sun Yat Sen Univ, Sch Business, Guangzhou 510275, Peoples R China
[2] Tsinghua Univ, Dept Automat, Beijing 100086, Peoples R China
基金
中国国家自然科学基金;
关键词
Dynamic programming; Markov decision process; Discounted mean-variance; Bilevel optimization; Bellman local-optimality equation; PORTFOLIO SELECTION; PROSPECT-THEORY; TRADEOFFS; MODEL; RISK;
D O I
10.1016/j.ejor.2023.06.022
中图分类号
C93 [管理学];
学科分类号
12 ; 1201 ; 1202 ; 120202 ;
摘要
This paper studies the risk-averse mean-variance optimization in infinite-horizon discounted Markov de-cision processes (MDPs). The involved variance metric concerns reward variability during the whole pro-cess, and future deviations are discounted to their present values. This discounted mean-variance op-timization yields a reward function dependent on a discounted mean, and this dependency renders traditional dynamic programming methods inapplicable since it suppresses a crucial property-time-consistency. To deal with this unorthodox problem, we introduce a pseudo mean to transform the un-treatable MDP to a standard one with a redefined reward function in standard form and derive a dis-counted mean-variance performance difference formula. With the pseudo mean, we propose a unified al-gorithm framework with a bilevel optimization structure for the discounted mean-variance optimization. The framework unifies a variety of algorithms for several variance-related problems, including, but not limited to, risk-averse variance and mean-variance optimizations in discounted and average MDPs. Fur-thermore, the convergence analyses missing from the literature can be complemented with the proposed framework as well. Taking the value iteration as an example, we develop a discounted mean-variance value iteration algorithm and prove its convergence to a local optimum with the aid of a Bellman local-optimality equation. Finally, we conduct a numerical experiment on portfolio management to validate the proposed algorithm.& COPY; 2023 Elsevier B.V. All rights reserved.
引用
收藏
页码:1057 / 1067
页数:11
相关论文
共 50 条
  • [1] A mean-variance optimization problem for discounted Markov decision processes
    Guo, Xianping
    Ye, Liuer
    Yin, George
    [J]. EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2012, 220 (02) : 423 - 429
  • [2] Mean-variance optimization of discrete time discounted Markov decision processes
    Xia, Li
    [J]. AUTOMATICA, 2018, 88 : 76 - 82
  • [3] An optimistic value iteration for mean-variance optimization in discounted Markov decision processes
    Ma, Shuai
    Ma, Xiaoteng
    Xia, Li
    [J]. RESULTS IN CONTROL AND OPTIMIZATION, 2022, 8
  • [4] Algorithmic aspects of mean-variance optimization in Markov decision processes
    Mannor, Shie
    Tsitsiklis, John N.
    [J]. EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2013, 231 (03) : 645 - 653
  • [5] A Mean-Variance Optimization Algorithm
    Erlich, Istvan
    Venayagamoorthy, Ganesh K.
    Worawat, Nakawiro
    [J]. 2010 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2010,
  • [6] THE VARIANCE OF DISCOUNTED MARKOV DECISION-PROCESSES
    SOBEL, MJ
    [J]. JOURNAL OF APPLIED PROBABILITY, 1982, 19 (04) : 794 - 802
  • [7] Dynamic group optimization algorithm with a mean-variance search framework
    Tang, Rui
    Yang, Jie
    Fong, Simon
    Wong, Raymond
    Vasilakos, Athanasios V.
    Chen, Yu
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2021, 183 (183)
  • [8] Mean-Variance Problems for Finite Horizon Semi-Markov Decision Processes
    Yonghui Huang
    Xianping Guo
    [J]. Applied Mathematics & Optimization, 2015, 72 : 233 - 259
  • [9] Heuristic mean-variance optimization in Markov decision processes using state-dependent risk aversion
    Schlosser, Rainer
    [J]. IMA JOURNAL OF MANAGEMENT MATHEMATICS, 2022, 33 (02) : 181 - 199
  • [10] Mean-Variance Problems for Finite Horizon Semi-Markov Decision Processes
    Huang, Yonghui
    Guo, Xianping
    [J]. APPLIED MATHEMATICS AND OPTIMIZATION, 2015, 72 (02): : 233 - 259