Predictive representations can link model-based reinforcement learning to model-free mechanisms

被引:133
|
作者
Russek, Evan M. [1 ]
Momennejad, Ida [2 ,3 ]
Botvinick, Matthew M. [4 ,5 ]
Gershman, Samuel J. [6 ,7 ]
Daw, Nathaniel D. [2 ,3 ]
机构
[1] NYU, Ctr Neural Sci, New York, NY 10003 USA
[2] Princeton Univ, Princeton Neurosci Inst, Princeton, NJ 08544 USA
[3] Princeton Univ, Dept Psychol, Princeton, NJ 08544 USA
[4] DeepMind, London, England
[5] UCL, Gatsby Computat Neurosci Unit, London, England
[6] Harvard Univ, Dept Psychol, 33 Kirkland St, Cambridge, MA 02138 USA
[7] Harvard Univ, Ctr Brain Sci, Cambridge, MA 02138 USA
基金
美国国家卫生研究院;
关键词
BASAL GANGLIA; PREFRONTAL CORTEX; COGNITIVE MAP; SUCCESSOR REPRESENTATION; ORBITOFRONTAL CORTEX; DOPAMINE NEURONS; TASK; REWARD; HIPPOCAMPUS; EXPERIENCE;
D O I
10.1371/journal.pcbi.1005768
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Humans and animals are capable of evaluating actions by considering their long-run future rewards through a process described using model-based reinforcement learning (RL) algorithms. The mechanisms by which neural circuits perform the computations prescribed by model-based RL remain largely unknown; however, multiple lines of evidence suggest that neural circuits supporting model-based behavior are structurally homologous to and overlapping with those thought to carry out model-free temporal difference (TD) learning. Here, we lay out a family of approaches by which model-based computation may be built upon a core of TD learning. The foundation of this framework is the successor representation, a predictive state representation that, when combined with TD learning of value predictions, can produce a subset of the behaviors associated with model-based learning, while requiring less decision-time computation than dynamic programming. Using simulations, we delineate the precise behavioral capabilities enabled by evaluating actions using this approach, and compare them to those demonstrated by biological organisms. We then introduce two new algorithms that build upon the successor representation while progressively mitigating its limitations. Because this framework can account for the full range of observed putatively model-based behaviors while still utilizing a core TD framework, we suggest that it represents a neurally plausible family of mechanisms for model-based evaluation.
引用
收藏
页数:35
相关论文
共 50 条
  • [21] The modulation of acute stress on model-free and model-based reinforcement learning in gambling disorder
    Wyckmans, Florent
    Banerjee, Nilosmita
    Saeremans, Melanie
    Otto, Ross
    Kornreich, Charles
    Vanderijst, Laetitia
    Gruson, Damien
    Carbone, Vincenzo
    Bechara, Antoine
    Buchanan, Tony
    Noel, Xavier
    [J]. JOURNAL OF BEHAVIORAL ADDICTIONS, 2022, 11 (03) : 831 - 844
  • [22] Model-based decision making and model-free learning
    Drummond, Nicole
    Niv, Yael
    [J]. CURRENT BIOLOGY, 2020, 30 (15) : R860 - R865
  • [23] Model-Free and Model-Based Active Learning for Regression
    O'Neill, Jack
    Delany, Sarah Jane
    MacNamee, Brian
    [J]. ADVANCES IN COMPUTATIONAL INTELLIGENCE SYSTEMS, 2017, 513 : 375 - 386
  • [24] Sim-to-Real Model-Based and Model-Free Deep Reinforcement Learning for Tactile Pushing
    Yang, Max
    Lin, Yijiong
    Church, Alex
    Lloyd, John
    Zhang, Dandan
    Barton, David A. W.
    Lepora, Nathan F.
    [J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (09) : 5480 - 5487
  • [25] Comparative study of model-based and model-free reinforcement learning control performance in HVAC systems
    Gao, Cheng
    Wang, Dan
    [J]. JOURNAL OF BUILDING ENGINEERING, 2023, 74
  • [26] Combining Model-Based and Model-Free Reinforcement Learning Policies for More Efficient Sepsis Treatment
    Liu, Xiangyu
    Yu, Chao
    Huang, Qikai
    Wang, Luhao
    Wu, Jianfeng
    Guan, Xiangdong
    [J]. BIOINFORMATICS RESEARCH AND APPLICATIONS, ISBRA 2021, 2021, 13064 : 105 - 117
  • [27] Model-based and model-free mechanisms in methamphetamine use disorder
    Robinson, Alex H.
    Mahlberg, Justin
    Chong, Trevor T. -J.
    Verdejo-Garcia, Antonio
    [J]. ADDICTION BIOLOGY, 2024, 29 (01)
  • [28] Ventral Striatum and Orbitofrontal Cortex Are Both Required for Model-Based, But Not Model-Free, Reinforcement Learning
    McDannald, Michael A.
    Lucantonio, Federica
    Burke, Kathryn A.
    Niv, Yael
    Schoenbaum, Geoffrey
    [J]. JOURNAL OF NEUROSCIENCE, 2011, 31 (07): : 2700 - 2705
  • [29] Model-Based Transfer Reinforcement Learning Based on Graphical Model Representations
    Sun, Yuewen
    Zhang, Kun
    Sun, Changyin
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (02) : 1035 - 1048
  • [30] DreamerPro: Reconstruction-Free Model-Based Reinforcement Learning with Prototypical Representations
    Deng, Fei
    Jang, Ingook
    Ahn, Sungjin
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,