Reinforcement learning with immediate rewards and linear hypotheses

被引:44
|
作者
Abe, N
Biermann, AW
Long, PM
机构
[1] IBM Corp, Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA
[2] Duke Univ, Dept Comp Sci, Durham, NC 27708 USA
[3] Genome Inst Singapore, Singapore 117528, Singapore
关键词
computational learning theory; reinforcement learning; immediate rewards; online learning; online algorithms; decision theory; dialogue systems;
D O I
10.1007/s00453-003-1038-1
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We consider the design and analysis of algorithms that learn from the consequences of their actions with the goal of maximizing their cumulative reward, when the consequence of a given action is felt immediately, and a linear function, which is unknown a priori, (approximately) relates a feature vector for each action/state pair to the (expected) associated reward. We focus on two cases, one in which a continuous-valued reward is (approximately) given by applying the unknown linear function, and another in which the probability of receiving the larger of binary-valued rewards is obtained. For these cases we provide bounds on the per-trial regret for our algorithms that go to zero as the number of trials approaches infinity. We also provide lower bounds that show that the rate of convergence is nearly optimal.
引用
收藏
页码:263 / 293
页数:31
相关论文
共 50 条
  • [31] Exploring selfish reinforcement learning in repeated games with stochastic rewards
    Katja Verbeeck
    Ann Nowé
    Johan Parent
    Karl Tuyls
    Autonomous Agents and Multi-Agent Systems, 2007, 14 : 239 - 269
  • [32] No-Regret Reinforcement Learning with Heavy-Tailed Rewards
    Zhuang, Vincent
    Sui, Yanan
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [33] Robust Offline Reinforcement Learning with Heavy-Tailed Rewards
    Zhu, Jin
    Wan, Runzhe
    Qi, Zhengling
    Luo, Shikai
    Shi, Chengchun
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [34] Potential-Based Difference Rewards for Multiagent Reinforcement Learning
    Devlin, Sam
    Yliniemi, Logan
    Kudenko, Daniel
    Tumer, Kagan
    AAMAS'14: PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS & MULTIAGENT SYSTEMS, 2014, : 165 - 172
  • [35] Demonstration and offset augmented meta reinforcement learning with sparse rewards
    Li, Haorui
    Liang, Jiaqi
    Wang, Xiaoxuan
    Jiang, Chengzhi
    Li, Linjing
    Zeng, Daniel
    COMPLEX & INTELLIGENT SYSTEMS, 2025, 11 (04)
  • [36] Contrastive Visual Explanations for Reinforcement Learning via Counterfactual Rewards
    Liu, Xiaowei
    McAreavey, Kevin
    Liu, Weiru
    EXPLAINABLE ARTIFICIAL INTELLIGENCE, XAI 2023, PT II, 2023, 1902 : 72 - 87
  • [37] Reinforcement Learning With Composite Rewards for Production Scheduling in a Smart Factory
    Zhou, Tong
    Tang, Dunbing
    Zhu, Haihua
    Wang, Liping
    IEEE ACCESS, 2021, 9 : 752 - 766
  • [38] Individual Versus Difference Rewards on Reinforcement Learning for Route Choice
    Grunitzki, Ricardo
    Ramos, Gabriel de O.
    Bazzan, Ana L. C.
    2014 BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 2014, : 253 - 258
  • [39] INFLUENCE OF REINFORCEMENT TECHNIQUE ON EFFECTS OF MATERIAL REWARDS IN CHILDRENS LEARNING
    MCCULLER.JC
    STAAT, J
    PSYCHONOMIC SCIENCE, 1972, 29 (4B): : 267 - &
  • [40] Exploring selfish reinforcement learning in repeated games with stochastic rewards
    Verbeeck, Katja
    Nowe, Ann
    Parent, Johan
    Tuyls, Karl
    AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2007, 14 (03) : 239 - 269