Reinforcement learning with immediate rewards and linear hypotheses

被引:44
|
作者
Abe, N
Biermann, AW
Long, PM
机构
[1] IBM Corp, Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA
[2] Duke Univ, Dept Comp Sci, Durham, NC 27708 USA
[3] Genome Inst Singapore, Singapore 117528, Singapore
关键词
computational learning theory; reinforcement learning; immediate rewards; online learning; online algorithms; decision theory; dialogue systems;
D O I
10.1007/s00453-003-1038-1
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We consider the design and analysis of algorithms that learn from the consequences of their actions with the goal of maximizing their cumulative reward, when the consequence of a given action is felt immediately, and a linear function, which is unknown a priori, (approximately) relates a feature vector for each action/state pair to the (expected) associated reward. We focus on two cases, one in which a continuous-valued reward is (approximately) given by applying the unknown linear function, and another in which the probability of receiving the larger of binary-valued rewards is obtained. For these cases we provide bounds on the per-trial regret for our algorithms that go to zero as the number of trials approaches infinity. We also provide lower bounds that show that the rate of convergence is nearly optimal.
引用
收藏
页码:263 / 293
页数:31
相关论文
共 50 条
  • [41] MoleGuLAR: Molecule Generation Using Reinforcement Learning with Alternating Rewards
    Goel, Manan
    Raghunathan, Shampa
    Laghuvarapu, Siddhartha
    Priyakumar, U. Deva
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2021, 61 (12) : 5815 - 5826
  • [42] Hierarchical Reinforcement Learning with Advantage-Based Auxiliary Rewards
    Li, Siyuan
    Wang, Rui
    Tang, Minxue
    Zhang, Chongjie
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [43] Multiple rewards fuzzy reinforcement learning algorithm in RoboCup environment
    Li, S
    Yao, JY
    Ye, Z
    Sun, ZQ
    PROCEEDINGS OF THE 2001 IEEE INTERNATIONAL CONFERENCE ON CONTROL APPLICATIONS (CCA'01), 2001, : 317 - 322
  • [44] Discovering and Removing Exogenous State Variables and Rewards for Reinforcement Learning
    Dietterich, Thomas
    Trimponias, George
    Chen, Zhitang
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [45] Finding intrinsic rewards by embodied evolution and constrained reinforcement learning
    Uchibe, Eiji
    Doya, Kenji
    NEURAL NETWORKS, 2008, 21 (10) : 1447 - 1455
  • [46] Deep Reinforcement Learning with Distributional Semantic Rewards for Abstractive Summarization
    Li, Siyao
    Lei, Deren
    Qin, Pengda
    Wang, William Yang
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 6038 - 6044
  • [47] Modular neural networks for reinforcement learning with temporal intrinsic rewards
    Takeuchi, Johane
    Shouno, Osamu
    Tsujino, Hiroshi
    2007 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-6, 2007, : 1151 - 1156
  • [48] Neural Keyphrase Generation via Reinforcement Learning with Adaptive Rewards
    Chan, Hou Pong
    Chen, Wang
    Wang, Lu
    King, Irwin
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 2163 - 2174
  • [49] Erotic cue exposure increases physiological arousal, biases choices toward immediate rewards, and attenuates model-based reinforcement learning
    Mathar, David
    Wiebe, Annika
    Tuzsus, Deniz
    Knauth, Kilian
    Peters, Jan
    PSYCHOPHYSIOLOGY, 2023, 60 (12)
  • [50] The "rewards" of positive reinforcement
    Yin, Sophia
    VETERINARY TECHNICIAN, 2006, 27 (12): : 792 - +