Prediction and Control in Continual Reinforcement Learning

被引:0
|
作者
Anand, Nishanth [1 ,2 ]
Precup, Doina [1 ,3 ]
机构
[1] McGill Univ, Sch Comp Sci, Montreal, PQ, Canada
[2] Mila, Milan, Italy
[3] Deepmind, London, England
基金
加拿大自然科学与工程研究理事会;
关键词
GAME; GO;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Temporal difference (TD) learning is often used to update the estimate of the value function which is used by RL agents to extract useful policies. In this paper, we focus on value function estimation in continual reinforcement learning. We propose to decompose the value function into two components which update at different timescales: a permanent value function, which holds general knowledge that persists over time, and a transient value function, which allows quick adaptation to new situations. We establish theoretical results showing that our approach is well suited for continual learning and draw connections to the complementary learning systems (CLS) theory from neuroscience. Empirically, this approach improves performance significantly on both prediction and control problems.
引用
收藏
页数:39
相关论文
共 50 条
  • [41] Continual learning, deep reinforcement learning, and microcircuits: a novel method for clever game playing
    Chang O.
    Ramos L.
    Morocho-Cayamcela M.E.
    Armas R.
    Zhinin-Vera L.
    Multimedia Tools and Applications, 2025, 84 (3) : 1537 - 1559
  • [42] Using Curiosity for an Even Representation of Tasks in Continual Offline Reinforcement Learning
    Pankayaraj Pathmanathan
    Natalia Díaz-Rodríguez
    Javier Del Ser
    Cognitive Computation, 2024, 16 : 425 - 453
  • [43] Reinforcement learning for quadrupedal locomotion with design of continual-hierarchical curriculum
    Kobayashi, Taisuke
    Sugino, Toshiki
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2020, 95 (95)
  • [44] Continual Deep Reinforcement Learning with Task-Agnostic Policy Distillation
    Hafez, Muhammad Burhan
    Erekmen, Kerim
    arXiv,
  • [45] Continual Deep Reinforcement Learning to Prevent Catastrophic Forgetting in Jamming Mitigation
    Nexcepta, Gaithersburg
    MD, United States
    Proc IEEE Mil Commun Conf MILCOM, 2024, (740-745):
  • [46] Continual portfolio selection in dynamic environments via incremental reinforcement learning
    Liu, Shu
    Wang, Bo
    Li, Huaxiong
    Chen, Chunlin
    Wang, Zhi
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (01) : 269 - 279
  • [47] Dynamics-Adaptive Continual Reinforcement Learning via Progressive Contextualization
    Zhang, Tiantian
    Lin, Zichuan
    Wang, Yuxing
    Ye, Deheng
    Fu, Qiang
    Yang, Wei
    Wang, Xueqian
    Liang, Bin
    Yuan, Bo
    Li, Xiu
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (10) : 14588 - 14602
  • [48] Exploiting experience accumulation in stock price prediction with continual learning
    Zhao, Cheng
    Hu, Ping
    Yao, Xiaomin
    APPLIED SOFT COMPUTING, 2025, 170
  • [49] Temporal Continual Learning with Prior Compensation for Human Motion Prediction
    Tang, Jianwei
    Sun, Jiangxin
    Lin, Xiaotong
    Zhang, Lifang
    Zheng, Wei-Shi
    Hu, Jian-Fang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [50] Continual learning for seizure prediction via memory projection strategy
    Shi, Yufei
    Tang, Shishi
    Li, Yuxuan
    He, Zhipeng
    Tang, Shengsheng
    Wang, Ruixuan
    Zheng, Weishi
    Chen, Ziyi
    Zhou, Yi
    Computers in Biology and Medicine, 2024, 181