Learning continuous-time working memory tasks with on-policy neural reinforcement learning

被引：3

作者：

Zambrano, Davide ^{[1
,2
,5
,6
,7
]}

Roelfsema, Pieter R. ^{[3
,4
]}

Bohte, Sander ^{[5
,6
,7
]}

机构：

[1] Ecole Polytech Fed Lausanne, Lab Intelligent Syst, Lausanne, Switzerland

[2] Ctr Wiskunde & Informat, Amsterdam, Netherlands

[3] Netherland Inst Neurosci, Amsterdam, Netherlands

[4] Vrije Univ Amsterdam, Dept Integrat Neurophysiol, Ctr Neurogen & Cognit Res, Amsterdam, Netherlands

[5] Ctr Wiskunde & Informat, Amsterdam, Netherlands

[6] Univ Amsterdam, Swammerdam Inst Life Sci, Amsterdam, Netherlands

[7] Rijksuniv, Dept Comp Sci, Groningen, Netherlands

来源：

NEUROCOMPUTING | 2021年 / 461卷

基金：

欧盟地平线“2020”; 欧洲研究理事会;

关键词：

Reinforcement learning; Neural networks; Working memory; Selective attention; Continuous-time SARSA; OPTIMAL DECISION-MAKING; STRIATONIGRAL INFLUENCE; STRIATAL FUNCTIONS; BASIC PROCESS; REPRESENTATIONS; NEURONS; DISINHIBITION; EXPRESSION; DOPAMINE; GANGLIA;

D O I：

10.1016/j.neucom.2020.11.072

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

An animals' ability to learn how to make decisions based on sensory evidence is often well described by Reinforcement Learning (RL) frameworks. These frameworks, however, typically apply to event-based representations and lack the explicit and fine-grained notion of time needed to study psychophysically relevant measures like reaction times and psychometric curves. Here, we develop and use a biologically plausible continuous-time RL scheme of CT-AuGMEnT (Continuous-Time Attention-Gated MEmory Tagging) to study these behavioural quantities. We show how CT-AuGMEnT implements on-policy SARSA learning as a biologically plausible form of reinforcement learning with working memory units using 'attentional' feedback. We show that the CT-AuGMEnT model efficiently learns tasks in continuous time and can learn to accumulate relevant evidence through time. This allows the model to link task difficulty to psychophysical measurements such as accuracy and reaction-times. We further show how the implementation of a separate accessory network for feedback allows the model to learn continuously, also in case of significant transmission delays between the network's feedforward and feedback layers and even when the accessory network is randomly initialized. Our results demonstrate that CTAuGMEnT represents a fully time-continuous biologically plausible end-to-end RL model for learning to integrate evidence and make decisions. (c) 2021 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

引用

页码：635 / 656

页数：22

共 50 条

[41] Policy Optimization for Continuous Reinforcement Learning
Zhao, Hanyang
Tang, Wenpin
Yao, David D.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[42] Output-feedback Quadratic Tracking Control of Continuous-time Systems by Using Off-policy Reinforcement Learning with Neural Networks Observer
Meng, Qingqing
Peng, Yunjian
PROCEEDINGS OF THE 32ND 2020 CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2020), 2020, : 1504 - 1509
[43] Neural network compression for reinforcement learning tasks
Dmitry A. Ivanov
Denis A. Larionov
Oleg V. Maslennikov
Vladimir V. Voevodin
Scientific Reports, 15 (1)
[44] H∞ Tracking Control of Completely Unknown Continuous-Time Systems via Off-Policy Reinforcement Learning
Modares, Hamidreza
Lewis, Frank L.
Jiang, Zhong-Ping
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2015, 26 (10) : 2550 - 2562
[45] PREDICTIVE LEARNING ENABLES NEURAL NETWORKS TO LEARN COMPLEX WORKING MEMORY TASKS
van der Plas, Thijs L.
Vogels, Tim P.
Manohar, Sanjay G.
CONFERENCE ON LIFELONG LEARNING AGENTS, VOL 199, 2022, 199
[46] Distributional Hamilton-Jacobi-Bellman Equations for Continuous-Time Reinforcement Learning
Wiltzer, Harley
Meger, David
Bellemare, Marc G.
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[47] Continuous-time mean-variance portfolio selection: A reinforcement learning framework
Wang, Haoran
Zhou, Xun Yu
MATHEMATICAL FINANCE, 2020, 30 (04) : 1273 - 1308
[48] Reinforcement learning for a class of continuous-time input constrained optimal control problems
Yaghmaie, Farnaz Adib
Braun, David J.
AUTOMATICA, 2019, 99 : 221 - 227
[49] Event-triggered integral reinforcement learning for nonlinear continuous-time systems
Zhang, Qichao
Zhao, Dongbin
2017 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2017, : 442 - 447
[50] Distributional Hamilton-Jacobi-Bellman Equations for Continuous-Time Reinforcement Learning
Wiltzer, Harley
Meger, David
Bellemare, Marc G.
Proceedings of Machine Learning Research, 2022, 162 : 23832 - 23856

← 1 2 3 4 5 →