Reinforcement learning with immediate rewards and linear hypotheses

被引：44

作者：

Abe, N

Biermann, AW

Long, PM

机构：

[1] IBM Corp, Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA

[2] Duke Univ, Dept Comp Sci, Durham, NC 27708 USA

[3] Genome Inst Singapore, Singapore 117528, Singapore

来源：

ALGORITHMICA | 2003年 / 37卷 / 04期

关键词：

computational learning theory; reinforcement learning; immediate rewards; online learning; online algorithms; decision theory; dialogue systems;

D O I：

10.1007/s00453-003-1038-1

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

We consider the design and analysis of algorithms that learn from the consequences of their actions with the goal of maximizing their cumulative reward, when the consequence of a given action is felt immediately, and a linear function, which is unknown a priori, (approximately) relates a feature vector for each action/state pair to the (expected) associated reward. We focus on two cases, one in which a continuous-valued reward is (approximately) given by applying the unknown linear function, and another in which the probability of receiving the larger of binary-valued rewards is obtained. For these cases we provide bounds on the per-trial regret for our algorithms that go to zero as the number of trials approaches infinity. We also provide lower bounds that show that the rate of convergence is nearly optimal.

引用

页码：263 / 293

页数：31

共 50 条

[1] Reinforcement Learning with Immediate Rewards and Linear Hypotheses
Naoki Abe
Alan W. Biermann
Philip M. Long
Algorithmica , 2003, 37 : 263 - 293
[2] Reinforcement Learning with Perturbed Rewards
Wang, Jingkang
Liu, Yang
Li, Bo
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 6202 - 6209
[3] IMMEDIATE LEARNING REINFORCEMENT
HAYES, RB
AV COMMUNICATION REVIEW, 1966, 14 (03) : 377 - 381
[4] Learning Intrinsic Symbolic Rewards in Reinforcement Learning
Sheikh, Hassam Ullah
Khadka, Shauharda
Miret, Santiago
Majumdar, Somdeb
Phielipp, Mariano
2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
[5] Online learning of shaping rewards in reinforcement learning
Grzes, Marek
Kudenko, Daniel
NEURAL NETWORKS, 2010, 23 (04) : 541 - 550
[6] Reinforcement Learning With Temporal Logic Rewards
Li, Xiao
Vasile, Cristian-Ioan
Belta, Calin
2017 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2017, : 3834 - 3839
[7] Reinforcement Learning with Multiple Shared Rewards
Guisi, Douglas M.
Ribeiro, Richardson
Teixeira, Marcelo
Borges, Andre P.
Enembreck, Fabricio
INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE 2016 (ICCS 2016), 2016, 80 : 855 - 864
[8] Intermittent Reinforcement Learning with Sparse Rewards
Sahoo, Prachi Pratyusha
Vamvoudakis, Kyriakos G.
2022 AMERICAN CONTROL CONFERENCE, ACC, 2022, : 2709 - 2714
[9] Reinforcement Learning for Joint Optimization of Multiple Rewards
Agarwal, Mridul
Aggarwal, Vaneet
JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
[10] Detecting Rewards Deterioration in Episodic Reinforcement Learning
Greenberg, Ido
Mannor, Shie
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139

← 1 2 3 4 5 →