Transition-based versus State-based Reward Functions for MDPs with Value-at-Risk

被引:0
|
作者
Ma, Shuai [1 ]
Yu, Jia Yuan [1 ]
机构
[1] Concordia Univ, Fac Engn & Comp Sci, Concordia Inst Informat Syst Engn, 1515 Ste Catherine St West, Montreal, PQ, Canada
关键词
OPTIMIZATION; VARIANCE; CRITERIA; MODELS;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In reinforcement learning, the reward function on current state and action is widely used. When the objective is about the expectation of the (discounted) total reward only, it works perfectly. However, if the objective involves the total reward distribution, the result will be wrong. This paper studies Value-at-Risk (VaR) problems in short-and long-horizon Markov decision processes (MDPs) with two reward functions, which share the same expectations. Firstly we show that with VaR objective, when the real reward function is transition-based (with respect to action and both current and next states), the simplified (state-based, with respect to action and current state only) reward function will change the VaR. Secondly, for long-horizon MDPs, we estimate the VaR function with the aid of spectral theory and the central limit theorem. Thirdly, since the estimation method is for a Markov reward process with the reward function on current state only, we present a transformation algorithm for the Markov reward process with the reward function on current and next states, in order to estimate the VaR function with an intact total reward distribution.
引用
下载
收藏
页码:974 / 981
页数:8
相关论文
共 50 条
  • [1] State-based versus reward-based motivation in younger and older adults
    Worthy, Darrell A.
    Cooper, Jessica A.
    Byrne, Kaileigh A.
    Gorlick, Marissa A.
    Maddox, W. Todd
    COGNITIVE AFFECTIVE & BEHAVIORAL NEUROSCIENCE, 2014, 14 (04) : 1208 - 1220
  • [2] State-based versus reward-based motivation in younger and older adults
    Darrell A. Worthy
    Jessica A. Cooper
    Kaileigh A. Byrne
    Marissa A. Gorlick
    W. Todd Maddox
    Cognitive, Affective, & Behavioral Neuroscience, 2014, 14 : 1208 - 1220
  • [3] Portfolio Optimization with Reward-Risk Ratio Measure based on the Conditional Value-at-Risk
    Ogryczak, Wlodzimierz
    Przyluski, Michal
    Sliwinski, Tomasz
    WORLD CONGRESS ON ENGINEERING AND COMPUTER SCIENCE, WCECS 2015, VOL II, 2015, : 913 - +
  • [4] Value-at-Risk based portfolio optimization
    Von Puelz, A
    STOCHASTIC OPTIMIZATION: ALGORITHMS AND APPLICATIONS, 2001, 54 : 279 - 302
  • [5] PORTFOLIO OPTIMIZATION BASED ON VALUE-AT-RISK
    Marinescu, Ilie
    PROCEEDINGS OF THE ROMANIAN ACADEMY SERIES A-MATHEMATICS PHYSICS TECHNICAL SCIENCES INFORMATION SCIENCE, 2013, 14 (03): : 187 - 192
  • [6] Bayesian forecasting of Value-at-Risk based on variant smooth transition heteroskedastic models
    Chen, Cathy W. S.
    Weng, Monica M. C.
    Watanabe, Toshiaki
    STATISTICS AND ITS INTERFACE, 2017, 10 (03) : 451 - 470
  • [7] Fuzzy Portfolio Selection based on Value-at-Risk
    Wang, Bo
    Wang, Shuming
    Watada, Junzo
    2009 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2009), VOLS 1-9, 2009, : 1840 - 1845
  • [8] Backtesting value-at-risk based on tail losses
    Wong, Woon K.
    JOURNAL OF EMPIRICAL FINANCE, 2010, 17 (03) : 526 - 538
  • [9] Event-Based Historical Value-at-Risk
    Hogenboom, Frederik
    de Winter, Michael
    Jansen, Milan
    Hogenboom, Alexander
    Frasincar, Flavius
    Kaymak, Uzay
    2012 IEEE CONFERENCE ON COMPUTATIONAL INTELLIGENCE FOR FINANCIAL ENGINEERING & ECONOMICS (CIFER), 2012, : 164 - 170
  • [10] Value-at-Risk Backtesting Procedures Based on Loss Functions: Simulation Analysis of the Power of Tests
    Piontek, Krzysztof
    DATA ANALYSIS, MACHINE LEARNING AND KNOWLEDGE DISCOVERY, 2014, : 273 - 281