Transition-based versus State-based Reward Functions for MDPs with Value-at-Risk

被引:0
|
作者
Ma, Shuai [1 ]
Yu, Jia Yuan [1 ]
机构
[1] Concordia Univ, Fac Engn & Comp Sci, Concordia Inst Informat Syst Engn, 1515 Ste Catherine St West, Montreal, PQ, Canada
关键词
OPTIMIZATION; VARIANCE; CRITERIA; MODELS;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In reinforcement learning, the reward function on current state and action is widely used. When the objective is about the expectation of the (discounted) total reward only, it works perfectly. However, if the objective involves the total reward distribution, the result will be wrong. This paper studies Value-at-Risk (VaR) problems in short-and long-horizon Markov decision processes (MDPs) with two reward functions, which share the same expectations. Firstly we show that with VaR objective, when the real reward function is transition-based (with respect to action and both current and next states), the simplified (state-based, with respect to action and current state only) reward function will change the VaR. Secondly, for long-horizon MDPs, we estimate the VaR function with the aid of spectral theory and the central limit theorem. Thirdly, since the estimation method is for a Markov reward process with the reward function on current state only, we present a transformation algorithm for the Markov reward process with the reward function on current and next states, in order to estimate the VaR function with an intact total reward distribution.
引用
下载
收藏
页码:974 / 981
页数:8
相关论文
共 50 条
  • [21] Simulation-based uniform value function estimates of discounted and average-reward MDPs
    Jain, R
    Varaiya, P
    2004 43RD IEEE CONFERENCE ON DECISION AND CONTROL (CDC), VOLS 1-5, 2004, : 4405 - 4410
  • [22] Requirements development in product design - A stateand state transition-based approach
    Grabowski, Hans
    Lossack, Ralf-Stefan
    Bruch, Christine
    TOOLS AND METHODS OF COMPETITIVE ENGINEERING Vols 1 and 2, 2004, : 1087 - 1088
  • [23] TR-FSM: Transition-Based Reconfigurable Finite State Machine
    Glaser, Johann
    Damm, Markus
    Haase, Jan
    Grimm, Christoph
    ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS, 2011, 4 (03)
  • [24] Oxidation State-Based Selectivity Tuning in Transition Metal Catalysis
    Zou, Tian-Yi
    Zhang, Qing-Wei
    CHEMCATCHEM, 2024, 16 (08)
  • [25] An approach to capital allocation based on mean conditional value-at-risk
    Han, Yuecai
    Zhang, Fengtong
    Liu, Xinyu
    JOURNAL OF RISK, 2023, 25 (06): : 53 - 71
  • [26] Steel Products' Pledging Rate Based on Value-at-Risk Model
    Yang, Haoxiong
    Fang, Yanan
    Zhou, Jingjie
    Zhou, Yongsheng
    2012 FIFTH INTERNATIONAL CONFERENCE ON BUSINESS INTELLIGENCE AND FINANCIAL ENGINEERING (BIFE), 2012, : 332 - 336
  • [27] Value-at-Risk Based Portfolio Management in Electric Power Sector
    Shi, Ran
    Zhong, Jin
    WMSCI 2008: 12TH WORLD MULTI-CONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL VI, PROCEEDINGS, 2008, : 249 - 253
  • [28] Uncertain random portfolio optimization models based on value-at-risk
    Qin, Zhongfeng
    Dai, Yuanzhen
    Zheng, Haitao
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2017, 32 (06) : 4523 - 4531
  • [29] A Conditional Value-at-Risk Based Inexact Water Allocation Model
    L. G. Shao
    X. S. Qin
    Y. Xu
    Water Resources Management, 2011, 25 : 2125 - 2145
  • [30] Forecasting Value-at-Risk with a duration-based POT method
    Araujo Santos, P.
    Fraga Alves, M. I.
    MATHEMATICS AND COMPUTERS IN SIMULATION, 2013, 94 : 295 - 309