Learning continuous-time working memory tasks with on-policy neural reinforcement learning

被引:3
|
作者
Zambrano, Davide [1 ,2 ,5 ,6 ,7 ]
Roelfsema, Pieter R. [3 ,4 ]
Bohte, Sander [5 ,6 ,7 ]
机构
[1] Ecole Polytech Fed Lausanne, Lab Intelligent Syst, Lausanne, Switzerland
[2] Ctr Wiskunde & Informat, Amsterdam, Netherlands
[3] Netherland Inst Neurosci, Amsterdam, Netherlands
[4] Vrije Univ Amsterdam, Dept Integrat Neurophysiol, Ctr Neurogen & Cognit Res, Amsterdam, Netherlands
[5] Ctr Wiskunde & Informat, Amsterdam, Netherlands
[6] Univ Amsterdam, Swammerdam Inst Life Sci, Amsterdam, Netherlands
[7] Rijksuniv, Dept Comp Sci, Groningen, Netherlands
基金
欧盟地平线“2020”; 欧洲研究理事会;
关键词
Reinforcement learning; Neural networks; Working memory; Selective attention; Continuous-time SARSA; OPTIMAL DECISION-MAKING; STRIATONIGRAL INFLUENCE; STRIATAL FUNCTIONS; BASIC PROCESS; REPRESENTATIONS; NEURONS; DISINHIBITION; EXPRESSION; DOPAMINE; GANGLIA;
D O I
10.1016/j.neucom.2020.11.072
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
An animals' ability to learn how to make decisions based on sensory evidence is often well described by Reinforcement Learning (RL) frameworks. These frameworks, however, typically apply to event-based representations and lack the explicit and fine-grained notion of time needed to study psychophysically relevant measures like reaction times and psychometric curves. Here, we develop and use a biologically plausible continuous-time RL scheme of CT-AuGMEnT (Continuous-Time Attention-Gated MEmory Tagging) to study these behavioural quantities. We show how CT-AuGMEnT implements on-policy SARSA learning as a biologically plausible form of reinforcement learning with working memory units using 'attentional' feedback. We show that the CT-AuGMEnT model efficiently learns tasks in continuous time and can learn to accumulate relevant evidence through time. This allows the model to link task difficulty to psychophysical measurements such as accuracy and reaction-times. We further show how the implementation of a separate accessory network for feedback allows the model to learn continuously, also in case of significant transmission delays between the network's feedforward and feedback layers and even when the accessory network is randomly initialized. Our results demonstrate that CTAuGMEnT represents a fully time-continuous biologically plausible end-to-end RL model for learning to integrate evidence and make decisions. (c) 2021 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
引用
收藏
页码:635 / 656
页数:22
相关论文
共 50 条
  • [31] Parallel Bootstrap-Based On-Policy Deep Reinforcement Learning for Continuous Fluid Flow Control Applications
    Viquerat, Jonathan
    Hachem, Elie
    FLUIDS, 2023, 8 (07)
  • [32] Dynamic Multiobjective Control for Continuous-Time Systems Using Reinforcement Learning
    Lopez, Victor G.
    Lewis, Frank L.
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2019, 64 (07) : 2869 - 2874
  • [33] Achieving Mean-Variance Efficiency by Continuous-Time Reinforcement Learning
    Huang, Yilie
    Jia, Yanwei
    Zhou, Xunyu
    Proceedings of the 3rd ACM International Conference on AI in Finance, ICAIF 2022, 2022, : 377 - 385
  • [34] Achieving Mean-Variance Efficiency by Continuous-Time Reinforcement Learning
    Huang, Yilie
    Jia, Yanwei
    Zhou, Xun Yu
    3RD ACM INTERNATIONAL CONFERENCE ON AI IN FINANCE, ICAIF 2022, 2022, : 377 - 385
  • [35] Efficient Exploration in Continuous-time Model-based Reinforcement Learning
    Treven, Lenart
    Hubotter, Jonas
    Sukhija, Bhavya
    Dorfler, Florian
    Krause, Andreas
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [36] Fault-Tolerant Control of Degrading Systems with On-Policy Reinforcement Learning
    Ahmed, Ibrahim
    Quinones-Grueiro, Marcos
    Biswas, Gautam
    IFAC PAPERSONLINE, 2020, 53 (02): : 13733 - 13738
  • [37] Competitive reinforcement learning in continuous control tasks
    Abramson, M
    Pachowicz, P
    Wechsler, H
    PROCEEDINGS OF THE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS 2003, VOLS 1-4, 2003, : 1909 - 1914
  • [38] On-policy learning-based deep reinforcement learning assessment for building control efficiency and stability
    Lee, Joon-Yong
    Rahman, Aowabin
    Huang, Sen
    Smith, Amanda D.
    Katipamula, Srinivas
    SCIENCE AND TECHNOLOGY FOR THE BUILT ENVIRONMENT, 2022, 28 (09) : 1150 - 1165
  • [39] Two novel on-policy reinforcement learning algorithms based on TD(λ)-methods
    Wiering, Marco A.
    van Hasselt, Hado
    2007 IEEE INTERNATIONAL SYMPOSIUM ON APPROXIMATE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING, 2007, : 280 - +
  • [40] Practical Critic Gradient based Actor Critic for On-Policy Reinforcement Learning
    Gurumurthy, Swaminathan
    Manchester, Zachary
    Kolter, J. Zico
    LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 211, 2023, 211