Learning continuous-time working memory tasks with on-policy neural reinforcement learning

被引:3
|
作者
Zambrano, Davide [1 ,2 ,5 ,6 ,7 ]
Roelfsema, Pieter R. [3 ,4 ]
Bohte, Sander [5 ,6 ,7 ]
机构
[1] Ecole Polytech Fed Lausanne, Lab Intelligent Syst, Lausanne, Switzerland
[2] Ctr Wiskunde & Informat, Amsterdam, Netherlands
[3] Netherland Inst Neurosci, Amsterdam, Netherlands
[4] Vrije Univ Amsterdam, Dept Integrat Neurophysiol, Ctr Neurogen & Cognit Res, Amsterdam, Netherlands
[5] Ctr Wiskunde & Informat, Amsterdam, Netherlands
[6] Univ Amsterdam, Swammerdam Inst Life Sci, Amsterdam, Netherlands
[7] Rijksuniv, Dept Comp Sci, Groningen, Netherlands
基金
欧盟地平线“2020”; 欧洲研究理事会;
关键词
Reinforcement learning; Neural networks; Working memory; Selective attention; Continuous-time SARSA; OPTIMAL DECISION-MAKING; STRIATONIGRAL INFLUENCE; STRIATAL FUNCTIONS; BASIC PROCESS; REPRESENTATIONS; NEURONS; DISINHIBITION; EXPRESSION; DOPAMINE; GANGLIA;
D O I
10.1016/j.neucom.2020.11.072
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
An animals' ability to learn how to make decisions based on sensory evidence is often well described by Reinforcement Learning (RL) frameworks. These frameworks, however, typically apply to event-based representations and lack the explicit and fine-grained notion of time needed to study psychophysically relevant measures like reaction times and psychometric curves. Here, we develop and use a biologically plausible continuous-time RL scheme of CT-AuGMEnT (Continuous-Time Attention-Gated MEmory Tagging) to study these behavioural quantities. We show how CT-AuGMEnT implements on-policy SARSA learning as a biologically plausible form of reinforcement learning with working memory units using 'attentional' feedback. We show that the CT-AuGMEnT model efficiently learns tasks in continuous time and can learn to accumulate relevant evidence through time. This allows the model to link task difficulty to psychophysical measurements such as accuracy and reaction-times. We further show how the implementation of a separate accessory network for feedback allows the model to learn continuously, also in case of significant transmission delays between the network's feedforward and feedback layers and even when the accessory network is randomly initialized. Our results demonstrate that CTAuGMEnT represents a fully time-continuous biologically plausible end-to-end RL model for learning to integrate evidence and make decisions. (c) 2021 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
引用
收藏
页码:635 / 656
页数:22
相关论文
共 50 条
  • [41] Policy Optimization for Continuous Reinforcement Learning
    Zhao, Hanyang
    Tang, Wenpin
    Yao, David D.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [42] Output-feedback Quadratic Tracking Control of Continuous-time Systems by Using Off-policy Reinforcement Learning with Neural Networks Observer
    Meng, Qingqing
    Peng, Yunjian
    PROCEEDINGS OF THE 32ND 2020 CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2020), 2020, : 1504 - 1509
  • [43] Neural network compression for reinforcement learning tasks
    Dmitry A. Ivanov
    Denis A. Larionov
    Oleg V. Maslennikov
    Vladimir V. Voevodin
    Scientific Reports, 15 (1)
  • [44] H∞ Tracking Control of Completely Unknown Continuous-Time Systems via Off-Policy Reinforcement Learning
    Modares, Hamidreza
    Lewis, Frank L.
    Jiang, Zhong-Ping
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2015, 26 (10) : 2550 - 2562
  • [45] PREDICTIVE LEARNING ENABLES NEURAL NETWORKS TO LEARN COMPLEX WORKING MEMORY TASKS
    van der Plas, Thijs L.
    Vogels, Tim P.
    Manohar, Sanjay G.
    CONFERENCE ON LIFELONG LEARNING AGENTS, VOL 199, 2022, 199
  • [46] Distributional Hamilton-Jacobi-Bellman Equations for Continuous-Time Reinforcement Learning
    Wiltzer, Harley
    Meger, David
    Bellemare, Marc G.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [47] Continuous-time mean-variance portfolio selection: A reinforcement learning framework
    Wang, Haoran
    Zhou, Xun Yu
    MATHEMATICAL FINANCE, 2020, 30 (04) : 1273 - 1308
  • [48] Reinforcement learning for a class of continuous-time input constrained optimal control problems
    Yaghmaie, Farnaz Adib
    Braun, David J.
    AUTOMATICA, 2019, 99 : 221 - 227
  • [49] Event-triggered integral reinforcement learning for nonlinear continuous-time systems
    Zhang, Qichao
    Zhao, Dongbin
    2017 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2017, : 442 - 447
  • [50] Distributional Hamilton-Jacobi-Bellman Equations for Continuous-Time Reinforcement Learning
    Wiltzer, Harley
    Meger, David
    Bellemare, Marc G.
    Proceedings of Machine Learning Research, 2022, 162 : 23832 - 23856