Reinforcement learning with immediate rewards and linear hypotheses

被引：44

作者：

Abe, N

Biermann, AW

Long, PM

机构：

[1] IBM Corp, Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA

[2] Duke Univ, Dept Comp Sci, Durham, NC 27708 USA

[3] Genome Inst Singapore, Singapore 117528, Singapore

来源：

ALGORITHMICA | 2003年 / 37卷 / 04期

关键词：

computational learning theory; reinforcement learning; immediate rewards; online learning; online algorithms; decision theory; dialogue systems;

D O I：

10.1007/s00453-003-1038-1

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

We consider the design and analysis of algorithms that learn from the consequences of their actions with the goal of maximizing their cumulative reward, when the consequence of a given action is felt immediately, and a linear function, which is unknown a priori, (approximately) relates a feature vector for each action/state pair to the (expected) associated reward. We focus on two cases, one in which a continuous-valued reward is (approximately) given by applying the unknown linear function, and another in which the probability of receiving the larger of binary-valued rewards is obtained. For these cases we provide bounds on the per-trial regret for our algorithms that go to zero as the number of trials approaches infinity. We also provide lower bounds that show that the rate of convergence is nearly optimal.

引用

页码：263 / 293

页数：31

共 50 条

[31] Exploring selfish reinforcement learning in repeated games with stochastic rewards
Katja Verbeeck
Ann Nowé
Johan Parent
Karl Tuyls
Autonomous Agents and Multi-Agent Systems, 2007, 14 : 239 - 269
[32] No-Regret Reinforcement Learning with Heavy-Tailed Rewards
Zhuang, Vincent
Sui, Yanan
24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
[33] Robust Offline Reinforcement Learning with Heavy-Tailed Rewards
Zhu, Jin
Wan, Runzhe
Qi, Zhengling
Luo, Shikai
Shi, Chengchun
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
[34] Potential-Based Difference Rewards for Multiagent Reinforcement Learning
Devlin, Sam
Yliniemi, Logan
Kudenko, Daniel
Tumer, Kagan
AAMAS'14: PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS & MULTIAGENT SYSTEMS, 2014, : 165 - 172
[35] Demonstration and offset augmented meta reinforcement learning with sparse rewards
Li, Haorui
Liang, Jiaqi
Wang, Xiaoxuan
Jiang, Chengzhi
Li, Linjing
Zeng, Daniel
COMPLEX & INTELLIGENT SYSTEMS, 2025, 11 (04)
[36] Contrastive Visual Explanations for Reinforcement Learning via Counterfactual Rewards
Liu, Xiaowei
McAreavey, Kevin
Liu, Weiru
EXPLAINABLE ARTIFICIAL INTELLIGENCE, XAI 2023, PT II, 2023, 1902 : 72 - 87
[37] Reinforcement Learning With Composite Rewards for Production Scheduling in a Smart Factory
Zhou, Tong
Tang, Dunbing
Zhu, Haihua
Wang, Liping
IEEE ACCESS, 2021, 9 : 752 - 766
[38] Individual Versus Difference Rewards on Reinforcement Learning for Route Choice
Grunitzki, Ricardo
Ramos, Gabriel de O.
Bazzan, Ana L. C.
2014 BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 2014, : 253 - 258
[39] INFLUENCE OF REINFORCEMENT TECHNIQUE ON EFFECTS OF MATERIAL REWARDS IN CHILDRENS LEARNING
MCCULLER.JC
STAAT, J
PSYCHONOMIC SCIENCE, 1972, 29 (4B): : 267 - &
[40] Exploring selfish reinforcement learning in repeated games with stochastic rewards
Verbeeck, Katja
Nowe, Ann
Parent, Johan
Tuyls, Karl
AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2007, 14 (03) : 239 - 269

← 1 2 3 4 5 →