General discounting versus average reward

被引:0
|
作者
Hutter, Marcus
机构
来源
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Consider an agent interacting with an environment in cycles. In every interaction cycle the agent is rewarded for its performance. We compare the average reward U from cycle I to m (average value) with the future discounted reward V from cycle k to infinity (discounted value). We consider essentially arbitrary (non-geometric) discount sequences and arbitrary reward sequences (non-MDP environments). We show that asymptotically U for m -> infinity and V for k -> infinity are equal, provided both limits exist. Further, if the effective horizon grows linearly with k or faster, then the existence of the limit of U implies that the limit of V exists. Conversely, if the effective horizon grows linearly with k or slower, then existence of the limit of V implies that the limit of U exists.
引用
收藏
页码:244 / 258
页数:15
相关论文
共 50 条
  • [1] Average Reward Optimization with Multiple Discounting Reinforcement Learners
    Reinke, Chris
    Uchibe, Eiji
    Doya, Kenji
    [J]. NEURAL INFORMATION PROCESSING, ICONIP 2017, PT I, 2017, 10634 : 789 - 800
  • [2] The effects of real versus hypothetical reward on delay and probability discounting
    Hinvest, Neal S.
    Anderson, Ian M.
    [J]. QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 2010, 63 (06): : 1072 - 1084
  • [3] Discounting of shared reward and selfishness
    Ito, M
    Saeki, D
    [J]. INTERNATIONAL JOURNAL OF PSYCHOLOGY, 2000, 35 (3-4) : 435 - 435
  • [4] Delayed reward and cost discounting
    Murphy, JG
    Vuchinich, RE
    Simpson, CA
    [J]. PSYCHOLOGICAL RECORD, 2001, 51 (04): : 571 - 588
  • [5] On Average Versus Discounted Reward Temporal-Difference Learning
    John N. Tsitsiklis
    Benjamin Van Roy
    [J]. Machine Learning, 2002, 49 : 179 - 191
  • [6] On average versus discounted reward temporal-difference learning
    Tsitsiklis, JN
    Van Roy, B
    [J]. MACHINE LEARNING, 2002, 49 (2-3) : 179 - 191
  • [7] Discounting Future Reward in an Uncertain World
    Story, G. W.
    Kurth-Nelson, Z.
    Moutoussis, M.
    Iigaya, K.
    Will, G. -j.
    Hauser, T. U.
    Blain, B.
    Vlaev, I.
    Dolan, R. J.
    [J]. DECISION-WASHINGTON, 2023, : 255 - 282
  • [8] Control of movements and temporal discounting of reward
    Shadmehr, Reza
    [J]. CURRENT OPINION IN NEUROBIOLOGY, 2010, 20 (06) : 726 - 730
  • [9] Reward contrast in delay and probability discounting
    Zhijie Dai
    Randolph C. Grace
    Simon Kemp
    [J]. Learning & Behavior, 2009, 37 : 281 - 288
  • [10] Reward contrast in delay and probability discounting
    Dai, Zhijie
    Grace, Randolph C.
    Kemp, Simon
    [J]. LEARNING & BEHAVIOR, 2009, 37 (03) : 281 - 288