Learning continuous-time working memory tasks with on-policy neural reinforcement learning

被引：3

作者：

Zambrano, Davide ^{[1
,2
,5
,6
,7
]}

Roelfsema, Pieter R. ^{[3
,4
]}

Bohte, Sander ^{[5
,6
,7
]}

机构：

[1] Ecole Polytech Fed Lausanne, Lab Intelligent Syst, Lausanne, Switzerland

[2] Ctr Wiskunde & Informat, Amsterdam, Netherlands

[3] Netherland Inst Neurosci, Amsterdam, Netherlands

[4] Vrije Univ Amsterdam, Dept Integrat Neurophysiol, Ctr Neurogen & Cognit Res, Amsterdam, Netherlands

[5] Ctr Wiskunde & Informat, Amsterdam, Netherlands

[6] Univ Amsterdam, Swammerdam Inst Life Sci, Amsterdam, Netherlands

[7] Rijksuniv, Dept Comp Sci, Groningen, Netherlands

来源：

NEUROCOMPUTING | 2021年 / 461卷

基金：

欧盟地平线“2020”; 欧洲研究理事会;

关键词：

Reinforcement learning; Neural networks; Working memory; Selective attention; Continuous-time SARSA; OPTIMAL DECISION-MAKING; STRIATONIGRAL INFLUENCE; STRIATAL FUNCTIONS; BASIC PROCESS; REPRESENTATIONS; NEURONS; DISINHIBITION; EXPRESSION; DOPAMINE; GANGLIA;

D O I：

10.1016/j.neucom.2020.11.072

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

An animals' ability to learn how to make decisions based on sensory evidence is often well described by Reinforcement Learning (RL) frameworks. These frameworks, however, typically apply to event-based representations and lack the explicit and fine-grained notion of time needed to study psychophysically relevant measures like reaction times and psychometric curves. Here, we develop and use a biologically plausible continuous-time RL scheme of CT-AuGMEnT (Continuous-Time Attention-Gated MEmory Tagging) to study these behavioural quantities. We show how CT-AuGMEnT implements on-policy SARSA learning as a biologically plausible form of reinforcement learning with working memory units using 'attentional' feedback. We show that the CT-AuGMEnT model efficiently learns tasks in continuous time and can learn to accumulate relevant evidence through time. This allows the model to link task difficulty to psychophysical measurements such as accuracy and reaction-times. We further show how the implementation of a separate accessory network for feedback allows the model to learn continuously, also in case of significant transmission delays between the network's feedforward and feedback layers and even when the accessory network is randomly initialized. Our results demonstrate that CTAuGMEnT represents a fully time-continuous biologically plausible end-to-end RL model for learning to integrate evidence and make decisions. (c) 2021 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

引用

页码：635 / 656

页数：22

共 50 条

[31] Parallel Bootstrap-Based On-Policy Deep Reinforcement Learning for Continuous Fluid Flow Control Applications
Viquerat, Jonathan
Hachem, Elie
FLUIDS, 2023, 8 (07)
[32] Dynamic Multiobjective Control for Continuous-Time Systems Using Reinforcement Learning
Lopez, Victor G.
Lewis, Frank L.
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2019, 64 (07) : 2869 - 2874
[33] Achieving Mean-Variance Efficiency by Continuous-Time Reinforcement Learning
Huang, Yilie
Jia, Yanwei
Zhou, Xunyu
Proceedings of the 3rd ACM International Conference on AI in Finance, ICAIF 2022, 2022, : 377 - 385
[34] Achieving Mean-Variance Efficiency by Continuous-Time Reinforcement Learning
Huang, Yilie
Jia, Yanwei
Zhou, Xun Yu
3RD ACM INTERNATIONAL CONFERENCE ON AI IN FINANCE, ICAIF 2022, 2022, : 377 - 385
[35] Efficient Exploration in Continuous-time Model-based Reinforcement Learning
Treven, Lenart
Hubotter, Jonas
Sukhija, Bhavya
Dorfler, Florian
Krause, Andreas
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[36] Fault-Tolerant Control of Degrading Systems with On-Policy Reinforcement Learning
Ahmed, Ibrahim
Quinones-Grueiro, Marcos
Biswas, Gautam
IFAC PAPERSONLINE, 2020, 53 (02): : 13733 - 13738
[37] Competitive reinforcement learning in continuous control tasks
Abramson, M
Pachowicz, P
Wechsler, H
PROCEEDINGS OF THE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS 2003, VOLS 1-4, 2003, : 1909 - 1914
[38] On-policy learning-based deep reinforcement learning assessment for building control efficiency and stability
Lee, Joon-Yong
Rahman, Aowabin
Huang, Sen
Smith, Amanda D.
Katipamula, Srinivas
SCIENCE AND TECHNOLOGY FOR THE BUILT ENVIRONMENT, 2022, 28 (09) : 1150 - 1165
[39] Two novel on-policy reinforcement learning algorithms based on TD(λ)-methods
Wiering, Marco A.
van Hasselt, Hado
2007 IEEE INTERNATIONAL SYMPOSIUM ON APPROXIMATE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING, 2007, : 280 - +
[40] Practical Critic Gradient based Actor Critic for On-Policy Reinforcement Learning
Gurumurthy, Swaminathan
Manchester, Zachary
Kolter, J. Zico
LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 211, 2023, 211

← 1 2 3 4 5 →