Transition-based versus State-based Reward Functions for MDPs with Value-at-Risk

被引:0
|
作者
Ma, Shuai [1 ]
Yu, Jia Yuan [1 ]
机构
[1] Concordia Univ, Fac Engn & Comp Sci, Concordia Inst Informat Syst Engn, 1515 Ste Catherine St West, Montreal, PQ, Canada
关键词
OPTIMIZATION; VARIANCE; CRITERIA; MODELS;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In reinforcement learning, the reward function on current state and action is widely used. When the objective is about the expectation of the (discounted) total reward only, it works perfectly. However, if the objective involves the total reward distribution, the result will be wrong. This paper studies Value-at-Risk (VaR) problems in short-and long-horizon Markov decision processes (MDPs) with two reward functions, which share the same expectations. Firstly we show that with VaR objective, when the real reward function is transition-based (with respect to action and both current and next states), the simplified (state-based, with respect to action and current state only) reward function will change the VaR. Secondly, for long-horizon MDPs, we estimate the VaR function with the aid of spectral theory and the central limit theorem. Thirdly, since the estimation method is for a Markov reward process with the reward function on current state only, we present a transformation algorithm for the Markov reward process with the reward function on current and next states, in order to estimate the VaR function with an intact total reward distribution.
引用
收藏
页码:974 / 981
页数:8
相关论文
共 50 条
  • [41] GAS and GARCH based value-at-risk modeling of precious metals
    Owusu, Peterson, Jr.
    Tiwari, Aviral Kumar
    Tweneboah, George
    Asafo-Adjei, Emmanuel
    RESOURCES POLICY, 2022, 75
  • [42] Fuzzy Power System Reliability Model Based on Value-at-Risk
    Wang, Bo
    Li, You
    Watada, Junzo
    KNOWLEDGE-BASED AND INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT II, 2010, 6277 : 445 - 453
  • [43] Estimating value-at-risk in electricity market based on grey extreme value theory
    Wang, Ruiqing
    Open Cybernetics and Systemics Journal, 2014, 8 : 896 - 903
  • [44] Portfolio value-at-risk forecasting with GA-based extreme value theory
    Lin, Ping-Chen
    Ko, Po-Chang
    EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (02) : 2503 - 2512
  • [45] Measuring risk in Value-at-Risk based on Student's t-distribution
    Huschens, S
    Kim, JR
    CLASSIFICATION IN THE INFORMATION AGE, 1999, : 453 - 459
  • [46] Risk Assessment for Transmission Network Planning Scheme based on Conditional Value-at-Risk
    Zou Qi
    You Dahai
    Liu Hengwei
    Qian Junjie
    Xu Heng
    Zhao Hongsheng
    2018 8TH INTERNATIONAL CONFERENCE ON POWER AND ENERGY SYSTEMS (ICPES), 2018, : 49 - 53
  • [47] Estimating Portfolio of Bonds Credit Risk Value-at-Risk Based on Copula Function
    Bi Tao
    Zhang Xiaofei
    RECENT ADVANCE IN STATISTICS APPLICATION AND RELATED AREAS, PTS 1 AND 2, 2008, : 1093 - 1097
  • [48] Measurement of HIS Stock Index Futures Market Risk Based on Value-at-Risk
    Gong Zhiyong
    Li Dingan
    PROCEEDINGS OF THE 15TH INTERNATIONAL CONFERENCE ON INDUSTRIAL ENGINEERING AND ENGINEERING MANAGEMENT, VOLS A-C, 2008, : 1906 - 1911
  • [49] Measurement of HIS Stock Index Futures Market Risk Based on Value-at-Risk
    Yan Dan
    Gong Zhiyong
    2009 INTERNATIONAL CONFERENCE ON INFORMATION MANAGEMENT, INNOVATION MANAGEMENT AND INDUSTRIAL ENGINEERING, VOL 3, PROCEEDINGS, 2009, : 78 - +
  • [50] Internal State-Based Risk Assessment for Robots in Hazardous Environment
    David, Jennifer
    Bridgwater, Thomas
    West, Andrew
    Lennox, Barry
    Giuliani, Manuel
    TOWARDS AUTONOMOUS ROBOTIC SYSTEMS, TAROS 2022, 2022, 13546 : 137 - 152