Neural Combinatorial Learning of Goal-directed Behavior with Reservoir Critic and Reward Modulated Hebbian Plasticity

被引:3
|
作者
Dasgupta, Sakyasingha [1 ]
Woergoetter, Florentin [1 ]
Morimoto, Jun [2 ]
Manoonpong, Poramate [1 ]
机构
[1] Univ Gottingen, BCCN, Friedrich Hund Pl 1, D-37077 Gottingen, Germany
[2] ATR Computat Neurosci Lab, Kyoto 6190288, Japan
关键词
Re-inforcement learning; Reservoir networks; Correlation learning; Temporal memory;
D O I
10.1109/SMC.2013.174
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Learning of goal-directed behaviors in biological systems is broadly based on associations between conditional and unconditional stimuli. This can be further classified as classical conditioning (correlation-based learning) and operant conditioning (reward-based learning). Although traditionally modeled as separate learning systems in artificial agents, numerous animal experiments point towards their co-operative role in behavioral learning. Based on this concept, the recently introduced framework of neural combinatorial learning combines the two systems where both the systems run in parallel to guide the overall learned behavior. Such a combinatorial learning demonstrates a faster and efficient learner. In this work, we further improve the framework by applying a reservoir computing network (RC) as an adaptive critic unit and reward modulated Hebbian plasticity. Using a mobile robot system for goal-directed behavior learning, we clearly demonstrate that the reservoir critic outperforms traditional radial basis function (RBF) critics in terms of stability of convergence and learning time. Furthermore the temporal memory in RC allows the system to learn partially observable markov decision process scenario, in contrast to a memoryless RBF critic.
引用
收藏
页码:993 / 1000
页数:8
相关论文
共 50 条
  • [1] Spatiotemporal motor learning with reward-modulated Hebbian plasticity in modular reservoir computing
    Kawai, Yuji
    Asada, Minoru
    NEUROCOMPUTING, 2023, 558
  • [2] Neural bases of goal-directed implicit learning
    Rostami, Maryam
    Hosseini, S. M. Hadi
    Takahashi, Makoto
    Sugiura, Motoaki
    Kawashima, Ryuta
    NEUROIMAGE, 2009, 48 (01) : 303 - 310
  • [3] Reward Reinforcement Creates Enduring Facilitation of Goal-directed Behavior
    Ballard, Ian C.
    Waskom, Michael
    Nix, Kerry C.
    D'Esposito, Mark
    JOURNAL OF COGNITIVE NEUROSCIENCE, 2024, 36 (12) : 2847 - 2862
  • [4] Goal-directed behavior and learning of living organisms
    Umryukhin, EA
    JOURNAL OF COMPUTER AND SYSTEMS SCIENCES INTERNATIONAL, 2003, 42 (03) : 425 - 434
  • [5] Development of Neural Networks Supporting Goal-Directed Behavior
    Johnson, Elizabeth L.
    Munro, Sarah E.
    Bunge, Silvia A.
    DEVELOPING COGNITIVE CONTROL PROCESSES: MECHANISMS, IMPLICATIONS, AND INTERVENTIONS, 2014, 37 : 23 - 54
  • [6] Spiking Neural Network Actor–Critic Reinforcement Learning with Temporal Coding and Reward-Modulated Plasticity
    D. S. Vlasov
    R. B. Rybka
    A. V. Serenko
    A. G. Sboev
    Moscow University Physics Bulletin, 2024, 79 (Suppl 2) : S944 - S952
  • [7] Monetary reward magnitude effects on behavior and brain function during goal-directed behavior
    P. Rosell-Negre
    J. C. Bustamante
    P. Fuentes-Claramonte
    V. Costumero
    S. Benabarre
    A. Barrós-Loscertales
    Brain Imaging and Behavior, 2017, 11 : 1037 - 1049
  • [8] Monetary reward magnitude effects on behavior and brain function during goal-directed behavior
    Rosell-Negre, P.
    Bustamante, J. C.
    Fuentes-Claramonte, P.
    Costumero, V.
    Benabarre, S.
    Barros-Loscertales, A.
    BRAIN IMAGING AND BEHAVIOR, 2017, 11 (04) : 1037 - 1049
  • [9] Neural encoding in the orbitofrontal cortex related to goal-directed behavior
    Furuyashiki, Tomoyuki
    Gallagher, Michela
    LINKING AFFECT TO ACTION: CRITICAL CONTRIBUTIONS OF THE ORBITOFRONTAL CORTEX, 2007, 1121 : 193 - 215
  • [10] Shaping embodied neural networks for adaptive goal-directed behavior
    Chao, Zenas C.
    Bakkum, Douglas J.
    Potter, Steve M.
    PLOS COMPUTATIONAL BIOLOGY, 2008, 4 (03)