Learning continuous-time working memory tasks with on-policy neural reinforcement learning

被引:3
|
作者
Zambrano, Davide [1 ,2 ,5 ,6 ,7 ]
Roelfsema, Pieter R. [3 ,4 ]
Bohte, Sander [5 ,6 ,7 ]
机构
[1] Ecole Polytech Fed Lausanne, Lab Intelligent Syst, Lausanne, Switzerland
[2] Ctr Wiskunde & Informat, Amsterdam, Netherlands
[3] Netherland Inst Neurosci, Amsterdam, Netherlands
[4] Vrije Univ Amsterdam, Dept Integrat Neurophysiol, Ctr Neurogen & Cognit Res, Amsterdam, Netherlands
[5] Ctr Wiskunde & Informat, Amsterdam, Netherlands
[6] Univ Amsterdam, Swammerdam Inst Life Sci, Amsterdam, Netherlands
[7] Rijksuniv, Dept Comp Sci, Groningen, Netherlands
基金
欧盟地平线“2020”; 欧洲研究理事会;
关键词
Reinforcement learning; Neural networks; Working memory; Selective attention; Continuous-time SARSA; OPTIMAL DECISION-MAKING; STRIATONIGRAL INFLUENCE; STRIATAL FUNCTIONS; BASIC PROCESS; REPRESENTATIONS; NEURONS; DISINHIBITION; EXPRESSION; DOPAMINE; GANGLIA;
D O I
10.1016/j.neucom.2020.11.072
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
An animals' ability to learn how to make decisions based on sensory evidence is often well described by Reinforcement Learning (RL) frameworks. These frameworks, however, typically apply to event-based representations and lack the explicit and fine-grained notion of time needed to study psychophysically relevant measures like reaction times and psychometric curves. Here, we develop and use a biologically plausible continuous-time RL scheme of CT-AuGMEnT (Continuous-Time Attention-Gated MEmory Tagging) to study these behavioural quantities. We show how CT-AuGMEnT implements on-policy SARSA learning as a biologically plausible form of reinforcement learning with working memory units using 'attentional' feedback. We show that the CT-AuGMEnT model efficiently learns tasks in continuous time and can learn to accumulate relevant evidence through time. This allows the model to link task difficulty to psychophysical measurements such as accuracy and reaction-times. We further show how the implementation of a separate accessory network for feedback allows the model to learn continuously, also in case of significant transmission delays between the network's feedforward and feedback layers and even when the accessory network is randomly initialized. Our results demonstrate that CTAuGMEnT represents a fully time-continuous biologically plausible end-to-end RL model for learning to integrate evidence and make decisions. (c) 2021 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
引用
收藏
页码:635 / 656
页数:22
相关论文
共 50 条
  • [21] IMPACT OF COMPUTATION IN INTEGRAL REINFORCEMENT LEARNING FOR CONTINUOUS-TIME CONTROL
    Cao, Wenhan
    Pan, Wei
    12th International Conference on Learning Representations, ICLR 2024, 2024,
  • [22] A learning result for continuous-time recurrent neural networks
    Sontag, ED
    SYSTEMS & CONTROL LETTERS, 1998, 34 (03) : 151 - 158
  • [23] A learning result for continuous-time recurrent neural networks
    Sontag, Eduardo D.
    Systems and Control Letters, 1998, 34 (03): : 151 - 158
  • [24] On-Policy Deep Reinforcement Learning for the Average-Reward Criterion
    Zhang, Yiming
    Ross, Keith W.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [25] Offline Reinforcement Learning with On-Policy Q-Function Regularization
    Shi, Laixi
    Dadashi, Robert
    Chi, Yuejie
    Castro, Pablo Samuel
    Geist, Matthieu
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, ECML PKDD 2023, PT IV, 2023, 14172 : 455 - 471
  • [26] Adaptive Control for Linearizable Systems Using On-Policy Reinforcement Learning
    Westenbroek, Tyler
    Mazumdar, Eric
    Fridovich-Keil, David
    Prabhu, Valmik
    Tomlin, Claire J.
    Sastry, S. Shankar
    2020 59TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2020, : 118 - 125
  • [27] Robust On-Policy Sampling for Data-Efficient Policy Evaluation in Reinforcement Learning
    Zhong, Rujie
    Zhang, Duohan
    Schafer, Lukas
    Albrecht, Stefano V.
    Hanna, Josiah P.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [28] Off-policy integral reinforcement learning optimal tracking control for continuous-time chaotic systems
    魏庆来
    宋睿卓
    孙秋野
    肖文栋
    Chinese Physics B, 2015, (09) : 151 - 156
  • [29] Off-policy integral reinforcement learning optimal tracking control for continuous-time chaotic systems
    Wei Qing-Lai
    Song Rui-Zhuo
    Sun Qiu-Ye
    Xiao Wen-Dong
    CHINESE PHYSICS B, 2015, 24 (09)
  • [30] Deep Reinforcement Learning for Continuous-time Self-triggered Control
    Wang, Ran
    Takeuchi, Ibuki
    Kashima, Kenji
    IFAC PAPERSONLINE, 2021, 54 (14): : 203 - 208