Reinforcement learning with immediate rewards and linear hypotheses

被引：44

作者：

Abe, N

Biermann, AW

Long, PM

机构：

[1] IBM Corp, Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA

[2] Duke Univ, Dept Comp Sci, Durham, NC 27708 USA

[3] Genome Inst Singapore, Singapore 117528, Singapore

来源：

ALGORITHMICA | 2003年 / 37卷 / 04期

关键词：

computational learning theory; reinforcement learning; immediate rewards; online learning; online algorithms; decision theory; dialogue systems;

D O I：

10.1007/s00453-003-1038-1

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

We consider the design and analysis of algorithms that learn from the consequences of their actions with the goal of maximizing their cumulative reward, when the consequence of a given action is felt immediately, and a linear function, which is unknown a priori, (approximately) relates a feature vector for each action/state pair to the (expected) associated reward. We focus on two cases, one in which a continuous-valued reward is (approximately) given by applying the unknown linear function, and another in which the probability of receiving the larger of binary-valued rewards is obtained. For these cases we provide bounds on the per-trial regret for our algorithms that go to zero as the number of trials approaches infinity. We also provide lower bounds that show that the rate of convergence is nearly optimal.

引用

页码：263 / 293

页数：31

共 50 条

[41] MoleGuLAR: Molecule Generation Using Reinforcement Learning with Alternating Rewards
Goel, Manan
Raghunathan, Shampa
Laghuvarapu, Siddhartha
Priyakumar, U. Deva
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2021, 61 (12) : 5815 - 5826
[42] Hierarchical Reinforcement Learning with Advantage-Based Auxiliary Rewards
Li, Siyuan
Wang, Rui
Tang, Minxue
Zhang, Chongjie
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[43] Multiple rewards fuzzy reinforcement learning algorithm in RoboCup environment
Li, S
Yao, JY
Ye, Z
Sun, ZQ
PROCEEDINGS OF THE 2001 IEEE INTERNATIONAL CONFERENCE ON CONTROL APPLICATIONS (CCA'01), 2001, : 317 - 322
[44] Discovering and Removing Exogenous State Variables and Rewards for Reinforcement Learning
Dietterich, Thomas
Trimponias, George
Chen, Zhitang
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
[45] Finding intrinsic rewards by embodied evolution and constrained reinforcement learning
Uchibe, Eiji
Doya, Kenji
NEURAL NETWORKS, 2008, 21 (10) : 1447 - 1455
[46] Deep Reinforcement Learning with Distributional Semantic Rewards for Abstractive Summarization
Li, Siyao
Lei, Deren
Qin, Pengda
Wang, William Yang
2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 6038 - 6044
[47] Modular neural networks for reinforcement learning with temporal intrinsic rewards
Takeuchi, Johane
Shouno, Osamu
Tsujino, Hiroshi
2007 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-6, 2007, : 1151 - 1156
[48] Neural Keyphrase Generation via Reinforcement Learning with Adaptive Rewards
Chan, Hou Pong
Chen, Wang
Wang, Lu
King, Irwin
57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 2163 - 2174
[49] Erotic cue exposure increases physiological arousal, biases choices toward immediate rewards, and attenuates model-based reinforcement learning
Mathar, David
Wiebe, Annika
Tuzsus, Deniz
Knauth, Kilian
Peters, Jan
PSYCHOPHYSIOLOGY, 2023, 60 (12)
[50] The "rewards" of positive reinforcement
Yin, Sophia
VETERINARY TECHNICIAN, 2006, 27 (12): : 792 - +

← 1 2 3 4 5 →