Learning and planning in environments with delayed feedback

被引:43
|
作者
Walsh, Thomas J. [1 ]
Nouri, Ali [1 ]
Li, Lihong [1 ]
Littman, Michael L. [1 ]
机构
[1] Rutgers State Univ, Dept Comp Sci, Piscataway, NJ 08854 USA
基金
美国国家科学基金会;
关键词
Reinforcement learning; Delayed feedback; Markov decision processes; MARKOV DECISION-PROCESSES;
D O I
10.1007/s10458-008-9056-7
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This work considers the problems of learning and planning in Markovian environments with constant observation and reward delays. We provide a hardness result for the general planning problem and positive results for several special cases with deterministic or otherwise constrained dynamics. We present an algorithm, Model Based Simulation, for planning in such environments and use model-based reinforcement learning to extend this approach to the learning setting in both finite and continuous environments. Empirical comparisons show this algorithm holds significant advantages over others for decision making in delayed-observation environments.
引用
收藏
页码:83 / 105
页数:23
相关论文
共 50 条
  • [1] Planning and learning in environments with delayed feedback
    Walsh, Thomas J.
    Nouri, Ali
    Li, Lihong
    Littman, Michael L.
    MACHINE LEARNING: ECML 2007, PROCEEDINGS, 2007, 4701 : 442 - +
  • [2] Learning and planning in environments with delayed feedback
    Thomas J. Walsh
    Ali Nouri
    Lihong Li
    Michael L. Littman
    Autonomous Agents and Multi-Agent Systems, 2009, 18 : 83 - 105
  • [3] Learning with Delayed Feedback
    Pranavan, Theivendiram
    Sim, Terence
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 4895 - 4902
  • [4] EFFECTS OF DELAYED INFORMATION FEEDBACK AND FEEDBACK CUES IN LEARNING ON DELAYED RETENTION
    SASSENRA.JM
    YONGE, GD
    JOURNAL OF EDUCATIONAL PSYCHOLOGY, 1969, 60 (03) : 174 - &
  • [5] Feedback of Delayed Rewards in XCS for Environments with Aliasing States
    Chen, Kuang-Yuan
    Lindsay, Peter A.
    ARTIFICIAL LIFE: BORROWING FROM BIOLOGY, PROCEEDINGS, 2009, 5865 : 252 - 261
  • [6] LEARNING FROM DELAYED FEEDBACK IN ADOLESCENCE
    Davidow, Juliet Y.
    Foerde, Karin
    Galvan, Adriana
    Shohamy, Daphna
    JOURNAL OF COGNITIVE NEUROSCIENCE, 2013, : 167 - 167
  • [7] DELAYED FEEDBACK IN STEERING DURING LEARNING AND TRANSFER OF LEARNING
    SMITH, KU
    SUSSMAN, HM
    JOURNAL OF APPLIED PSYCHOLOGY, 1970, 54 (04) : 334 - &
  • [8] LEARNING FROM FEEDBACK IN PROBABILISTIC ENVIRONMENTS
    KLAYMAN, J
    ACTA PSYCHOLOGICA, 1984, 56 (1-3) : 81 - 92
  • [9] Cascading Bandits: Optimizing Recommendation Frequency in Delayed Feedback Environments
    Wang, Dairui
    Cao, Junyu
    Zhang, Yan
    Qi, Wei
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [10] Planning Collaborative Learning in Virtual Environments
    Hernandez-Selles, Nuria
    Gonzalez-Sanmamed, Mercedes
    Munoz-Carril, Pablo
    COMUNICAR, 2014, 21 (42) : 25 - 33