Reinforcement learning with immediate rewards and linear hypotheses

被引:44
|
作者
Abe, N
Biermann, AW
Long, PM
机构
[1] IBM Corp, Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA
[2] Duke Univ, Dept Comp Sci, Durham, NC 27708 USA
[3] Genome Inst Singapore, Singapore 117528, Singapore
关键词
computational learning theory; reinforcement learning; immediate rewards; online learning; online algorithms; decision theory; dialogue systems;
D O I
10.1007/s00453-003-1038-1
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We consider the design and analysis of algorithms that learn from the consequences of their actions with the goal of maximizing their cumulative reward, when the consequence of a given action is felt immediately, and a linear function, which is unknown a priori, (approximately) relates a feature vector for each action/state pair to the (expected) associated reward. We focus on two cases, one in which a continuous-valued reward is (approximately) given by applying the unknown linear function, and another in which the probability of receiving the larger of binary-valued rewards is obtained. For these cases we provide bounds on the per-trial regret for our algorithms that go to zero as the number of trials approaches infinity. We also provide lower bounds that show that the rate of convergence is nearly optimal.
引用
收藏
页码:263 / 293
页数:31
相关论文
共 50 条
  • [1] Reinforcement Learning with Immediate Rewards and Linear Hypotheses
    Naoki Abe
    Alan W. Biermann
    Philip M. Long
    Algorithmica , 2003, 37 : 263 - 293
  • [2] Reinforcement Learning with Perturbed Rewards
    Wang, Jingkang
    Liu, Yang
    Li, Bo
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 6202 - 6209
  • [3] IMMEDIATE LEARNING REINFORCEMENT
    HAYES, RB
    AV COMMUNICATION REVIEW, 1966, 14 (03) : 377 - 381
  • [4] Learning Intrinsic Symbolic Rewards in Reinforcement Learning
    Sheikh, Hassam Ullah
    Khadka, Shauharda
    Miret, Santiago
    Majumdar, Somdeb
    Phielipp, Mariano
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [5] Online learning of shaping rewards in reinforcement learning
    Grzes, Marek
    Kudenko, Daniel
    NEURAL NETWORKS, 2010, 23 (04) : 541 - 550
  • [6] Reinforcement Learning With Temporal Logic Rewards
    Li, Xiao
    Vasile, Cristian-Ioan
    Belta, Calin
    2017 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2017, : 3834 - 3839
  • [7] Reinforcement Learning with Multiple Shared Rewards
    Guisi, Douglas M.
    Ribeiro, Richardson
    Teixeira, Marcelo
    Borges, Andre P.
    Enembreck, Fabricio
    INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE 2016 (ICCS 2016), 2016, 80 : 855 - 864
  • [8] Intermittent Reinforcement Learning with Sparse Rewards
    Sahoo, Prachi Pratyusha
    Vamvoudakis, Kyriakos G.
    2022 AMERICAN CONTROL CONFERENCE, ACC, 2022, : 2709 - 2714
  • [9] Reinforcement Learning for Joint Optimization of Multiple Rewards
    Agarwal, Mridul
    Aggarwal, Vaneet
    JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
  • [10] Detecting Rewards Deterioration in Episodic Reinforcement Learning
    Greenberg, Ido
    Mannor, Shie
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139