Learning continuous-time working memory tasks with on-policy neural reinforcement learning

被引:3
|
作者
Zambrano, Davide [1 ,2 ,5 ,6 ,7 ]
Roelfsema, Pieter R. [3 ,4 ]
Bohte, Sander [5 ,6 ,7 ]
机构
[1] Ecole Polytech Fed Lausanne, Lab Intelligent Syst, Lausanne, Switzerland
[2] Ctr Wiskunde & Informat, Amsterdam, Netherlands
[3] Netherland Inst Neurosci, Amsterdam, Netherlands
[4] Vrije Univ Amsterdam, Dept Integrat Neurophysiol, Ctr Neurogen & Cognit Res, Amsterdam, Netherlands
[5] Ctr Wiskunde & Informat, Amsterdam, Netherlands
[6] Univ Amsterdam, Swammerdam Inst Life Sci, Amsterdam, Netherlands
[7] Rijksuniv, Dept Comp Sci, Groningen, Netherlands
基金
欧盟地平线“2020”; 欧洲研究理事会;
关键词
Reinforcement learning; Neural networks; Working memory; Selective attention; Continuous-time SARSA; OPTIMAL DECISION-MAKING; STRIATONIGRAL INFLUENCE; STRIATAL FUNCTIONS; BASIC PROCESS; REPRESENTATIONS; NEURONS; DISINHIBITION; EXPRESSION; DOPAMINE; GANGLIA;
D O I
10.1016/j.neucom.2020.11.072
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
An animals' ability to learn how to make decisions based on sensory evidence is often well described by Reinforcement Learning (RL) frameworks. These frameworks, however, typically apply to event-based representations and lack the explicit and fine-grained notion of time needed to study psychophysically relevant measures like reaction times and psychometric curves. Here, we develop and use a biologically plausible continuous-time RL scheme of CT-AuGMEnT (Continuous-Time Attention-Gated MEmory Tagging) to study these behavioural quantities. We show how CT-AuGMEnT implements on-policy SARSA learning as a biologically plausible form of reinforcement learning with working memory units using 'attentional' feedback. We show that the CT-AuGMEnT model efficiently learns tasks in continuous time and can learn to accumulate relevant evidence through time. This allows the model to link task difficulty to psychophysical measurements such as accuracy and reaction-times. We further show how the implementation of a separate accessory network for feedback allows the model to learn continuously, also in case of significant transmission delays between the network's feedforward and feedback layers and even when the accessory network is randomly initialized. Our results demonstrate that CTAuGMEnT represents a fully time-continuous biologically plausible end-to-end RL model for learning to integrate evidence and make decisions. (c) 2021 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
引用
收藏
页码:635 / 656
页数:22
相关论文
共 50 条
  • [1] Continuous-time on-policy neural reinforcement learning of working memory tasks
    Zambrano, Davide
    Roelfsema, Pieter R.
    Bohte, Sander M.
    2015 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2015,
  • [2] Continuous-Time Spike-Based Reinforcement Learning for Working Memory Tasks
    Karamanis, Marios
    Zambrano, Davide
    Bohte, Sander
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2018, PT II, 2018, 11140 : 250 - 262
  • [3] On-policy concurrent reinforcement learning
    Banerjee, B
    Sen, S
    Peng, J
    JOURNAL OF EXPERIMENTAL & THEORETICAL ARTIFICIAL INTELLIGENCE, 2004, 16 (04) : 245 - 260
  • [4] Policy Gradient Reinforcement Learning for Parameterized Continuous-Time Optimal Control
    Yang, Xindi
    Zhang, Hao
    Wang, Zhuping
    PROCEEDINGS OF THE 33RD CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2021), 2021, : 59 - 64
  • [5] CHOQUET REGULARIZATION FOR CONTINUOUS-TIME REINFORCEMENT LEARNING
    Han, Xia
    Wang, Ruodu
    Zhou, Xun Yu
    SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2023, 61 (05) : 2777 - 2801
  • [6] Tabu search exploration for on-policy reinforcement learning
    Abramson, M
    Wechsler, H
    PROCEEDINGS OF THE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS 2003, VOLS 1-4, 2003, : 2910 - 2915
  • [7] Discretizing Continuous Action Space With Unimodal Probability Distributions for On-Policy Reinforcement Learning
    Zhu, Yuanyang
    Wang, Zhi
    Zhu, Yuanheng
    Chen, Chunlin
    Zhao, Dongbin
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
  • [8] Off-policy and on-policy reinforcement learning with the Tsetlin machine
    Saeed Rahimi Gorji
    Ole-Christoffer Granmo
    Applied Intelligence, 2023, 53 : 8596 - 8613
  • [9] Off-policy and on-policy reinforcement learning with the Tsetlin machine
    Gorji, Saeed Rahimi
    Granmo, Ole-Christoffer
    APPLIED INTELLIGENCE, 2023, 53 (08) : 8596 - 8613
  • [10] Neural H2 Control Using Continuous-Time Reinforcement Learning
    Perrusquia, Adolfo
    Yu, Wen
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (06) : 4485 - 4494