General discounting versus average reward

被引：0

作者：

Hutter, Marcus

机构：

来源：

ALGORITHMIC LEARNING THEORY, PROCEEDINGS | 2006年 / 4264卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Consider an agent interacting with an environment in cycles. In every interaction cycle the agent is rewarded for its performance. We compare the average reward U from cycle I to m (average value) with the future discounted reward V from cycle k to infinity (discounted value). We consider essentially arbitrary (non-geometric) discount sequences and arbitrary reward sequences (non-MDP environments). We show that asymptotically U for m -> infinity and V for k -> infinity are equal, provided both limits exist. Further, if the effective horizon grows linearly with k or faster, then the existence of the limit of U implies that the limit of V exists. Conversely, if the effective horizon grows linearly with k or slower, then existence of the limit of V implies that the limit of U exists.

引用

页码：244 / 258

页数：15

共 50 条

[1] Average Reward Optimization with Multiple Discounting Reinforcement Learners
Reinke, Chris
Uchibe, Eiji
Doya, Kenji
[J]. NEURAL INFORMATION PROCESSING, ICONIP 2017, PT I, 2017, 10634 : 789 - 800
[2] The effects of real versus hypothetical reward on delay and probability discounting
Hinvest, Neal S.
Anderson, Ian M.
[J]. QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 2010, 63 (06): : 1072 - 1084
[3] Discounting of shared reward and selfishness
Ito, M
Saeki, D
[J]. INTERNATIONAL JOURNAL OF PSYCHOLOGY, 2000, 35 (3-4) : 435 - 435
[4] Delayed reward and cost discounting
Murphy, JG
Vuchinich, RE
Simpson, CA
[J]. PSYCHOLOGICAL RECORD, 2001, 51 (04): : 571 - 588
[5] On Average Versus Discounted Reward Temporal-Difference Learning
John N. Tsitsiklis
Benjamin Van Roy
[J]. Machine Learning, 2002, 49 : 179 - 191
[6] On average versus discounted reward temporal-difference learning
Tsitsiklis, JN
Van Roy, B
[J]. MACHINE LEARNING, 2002, 49 (2-3) : 179 - 191
[7] Discounting Future Reward in an Uncertain World
Story, G. W.
Kurth-Nelson, Z.
Moutoussis, M.
Iigaya, K.
Will, G. -j.
Hauser, T. U.
Blain, B.
Vlaev, I.
Dolan, R. J.
[J]. DECISION-WASHINGTON, 2023, : 255 - 282
[8] Control of movements and temporal discounting of reward
Shadmehr, Reza
[J]. CURRENT OPINION IN NEUROBIOLOGY, 2010, 20 (06) : 726 - 730
[9] Reward contrast in delay and probability discounting
Zhijie Dai
Randolph C. Grace
Simon Kemp
[J]. Learning & Behavior, 2009, 37 : 281 - 288
[10] Reward contrast in delay and probability discounting
Dai, Zhijie
Grace, Randolph C.
Kemp, Simon
[J]. LEARNING & BEHAVIOR, 2009, 37 (03) : 281 - 288

← 1 2 3 4 5 →