Striatal and Tegmental Neurons Code Critical Signals for Temporal-Difference Learning of State Value in Domestic Chicks

被引:2
|
作者
Wen, Chentao [1 ]
Ogura, Yukiko [2 ,3 ]
Matsushima, Toshiya [4 ]
机构
[1] Hokkaido Univ, Grad Sch Life Sci, Sapporo, Hokkaido, Japan
[2] Hokkaido Univ, Grad Sch Med, Dept Psychiat, Sapporo, Hokkaido, Japan
[3] Japan Soc Promot Sci, Tokyo, Japan
[4] Hokkaido Univ, Fac Sci, Dept Biol, Sapporo, Hokkaido, Japan
来源
FRONTIERS IN NEUROSCIENCE | 2016年 / 10卷
基金
日本学术振兴会;
关键词
reinforcement learning; temporal-difference learning; state value; striatum; tegmentum; domestic chicks; extinction learning; MIDBRAIN DOPAMINE NEURONS; TONICALLY ACTIVE NEURONS; REWARD PREDICTION ERROR; BASAL GANGLIA; GALLUS-DOMESTICUS; VENTRAL STRIATUM; EFFERENT CONNECTIONS; LOBUS PAROLFACTORIUS; SOCIAL FACILITATION; ANTERIOR CINGULATE;
D O I
10.3389/fnins.2016.00476
中图分类号
Q189 [神经科学];
学科分类号
071006 ;
摘要
To ensure survival, animals must update the internal representations of their environment in a trial-and-error fashion. Psychological studies of associative learning and neurophysiological analyses of dopaminergic neurons have suggested that this updating process involves the temporal difference (TD) method in the basal ganglia network. However, the way in which the component variables of the TD method are implemented at the neuronal level is unclear. To investigate the underlying neural mechanisms, we trained domestic chicks to associate color cues with food rewards. We recorded neuronal activities from the medial striatum or tegmentum in a freely behaving condition and examined how reward omission changed neuronal firing. To compare neuronal activities with the signals assumed in the TD method, we simulated the behavioral task in the form of a finite sequence composed of discrete steps of time. The three signals assumed in the simulated task were the prediction signal, the target signal for updating, and the TD-error signal. In both the medial striatum and tegmentum, the majority of recorded neurons were categorized into three types according to their fitness for three models, though these neurons tended to form a continuum spectrum without distinct differences in the firing rate. Specifically, two types of striatal neurons successfully mimicked the target signal and the prediction signal. A linear summation of these two types of striatum neurons was a good fit for the activity of one type of tegmental neurons mimicking the TD error signal. The present study thus demonstrates that the striatum and tegmentum can convey the signals critically required for the TD method. Based on the theoretical and neurophysiological studies, together with tract-tracing data, we propose a novel model to explain how the convergence of signals represented in the striatum could lead to the computation of TD error in tegmental dopaminergic neurons.
引用
收藏
页数:25
相关论文
共 7 条
  • [1] Approximate value iteration and temporal-difference learning
    de Farias, DP
    Van Roy, B
    IEEE 2000 ADAPTIVE SYSTEMS FOR SIGNAL PROCESSING, COMMUNICATIONS, AND CONTROL SYMPOSIUM - PROCEEDINGS, 2000, : 48 - 51
  • [2] On the existence of fixed points for approximate value iteration and temporal-difference learning
    De Farias, DP
    Van Roy, B
    JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 2000, 105 (03) : 589 - 608
  • [3] On the Existence of Fixed Points for Approximate Value Iteration and Temporal-Difference Learning
    D. P. De Farias
    B. Van Roy
    Journal of Optimization Theory and Applications, 2000, 105 : 589 - 608
  • [4] Eigensubspace of Temporal-Difference Dynamics and How It Improves Value Approximation in Reinforcement Learning
    He, Qiang
    Zhou, Tianyi
    Fang, Meng
    Maghsudi, Setareh
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, ECML PKDD 2023, PT IV, 2023, 14172 : 573 - 589
  • [5] Intentionally-underestimated value function at terminal state for temporal-difference learning with mis-designed reward
    Kobayashi, Taisuke
    RESULTS IN CONTROL AND OPTIMIZATION, 2025, 18
  • [7] Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates
    Penedones, Hugo
    Riquelme, Carlos
    Vincent, Damien
    Maennel, Hartmut
    Mann, Timothy
    Barreto, Andre
    Gelly, Sylvain
    Neu, Gergely
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32