Reinforcement learning and the reward positivity with aversive outcomes

被引:0
|
作者
Bauer, Elizabeth A. [1 ,2 ]
Watanabe, Brandon K. [1 ]
Macnamara, Annmarie [1 ]
机构
[1] Texas A&M Univ, Dept Psychol & Brain Sci, College Stn, TX USA
[2] Texas A&M Univ, Dept Psychol & Brain Sci, 4235 TAMU, College Stn, TX 77843 USA
关键词
ERP; punishment; reinforcement learning; reward positivity (RewP); PREDICTION ERROR; DOPAMINE; FEEDBACK; ERP; METAANALYSIS; POTENTIALS; NEGATIVITY; P300; PCA;
D O I
10.1111/psyp.14460
中图分类号
B84 [心理学];
学科分类号
04 ; 0402 ;
摘要
The reinforcement learning (RL) theory of the reward positivity (RewP), an event-related potential (ERP) component that measures reward responsivity, suggests that the RewP should be largest when positive outcomes are unexpected and has been supported by work using appetitive outcomes (e.g., money). However, the RewP can also be elicited by the absence of aversive outcomes (e.g., shock). The limited work to-date that has manipulated expectancy while using aversive outcomes has not supported the predictions of RL theory. Nonetheless, this work has been difficult to reconcile with the appetitive literature because the RewP was not observed as a reward signal in these studies, which used passive tasks that did not involve participant choice. Here, we tested the predictions of the RL theory by manipulating expectancy in an active/choice-based threat-of-shock doors task that was previously found to elicit the RewP as a reward signal. Moreover, we used principal components analysis to isolate the RewP from overlapping ERP components. Eighty participants viewed pairs of doors surrounded by a red or green border; shock delivery was expected (80%) following red-bordered doors and unexpected (20%) following green-bordered doors. The RewP was observed as a reward signal (i.e., no shock > shock) that was not potentiated for unexpected feedback. In addition, the RewP was larger overall for unexpected (vs expected) feedback. Therefore, the RewP appears to reflect the additive (not interactive) effects of reward and expectancy, challenging the RL theory of the RewP, at least when reward is defined as the absence of an aversive outcome.
引用
收藏
页数:8
相关论文
共 50 条
  • [21] Distributional Reward Decomposition for Reinforcement Learning
    Lin, Zichuan
    Zhao, Li
    Yang, Derek
    Qin, Tao
    Yang, Guangwen
    Liu, Tie-Yan
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [22] Hierarchical average reward reinforcement learning
    Department of Computing Science, University of Alberta, Edmonton, Alta. T6G 2E8, Canada
    不详
    [J]. Journal of Machine Learning Research, 2007, 8 : 2629 - 2669
  • [23] Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning
    Icarte, Rodrigo Toro
    Klassen, Toryn Q.
    Valenzano, Richard
    McIlraith, Sheila A.
    [J]. Journal of Artificial Intelligence Research, 2022, 73 : 173 - 208
  • [24] Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning
    Icarte, Rodrigo Toro
    Klassen, Toryn Q.
    Valenzano, Richard
    Mcllraith, Sheila A.
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2022, 73 : 173 - 208
  • [25] Positivity and reward
    Baverstock, Anna
    Finlay, Fiona
    [J]. ARCHIVES OF DISEASE IN CHILDHOOD-EDUCATION AND PRACTICE EDITION, 2019, 104 (04): : 182 - 182
  • [26] Neural mechanisms of the nucleus accumbens circuit in reward and aversive learning
    Hikida, Takatoshi
    Morita, Makiko
    Macpherson, Tom
    [J]. NEUROSCIENCE RESEARCH, 2016, 108 : 1 - 5
  • [27] Actively learning costly reward functions for reinforcement learning
    Eberhard, Andre
    Metni, Houssam
    Fahland, Georg
    Stroh, Alexander
    Friederich, Pascal
    [J]. MACHINE LEARNING-SCIENCE AND TECHNOLOGY, 2024, 5 (01):
  • [28] Learning classifier system with average reward reinforcement learning
    Zang, Zhaoxiang
    Li, Dehua
    Wang, Junying
    Xia, Dan
    [J]. KNOWLEDGE-BASED SYSTEMS, 2013, 40 : 58 - 71
  • [29] Active Learning for Reward Estimation in Inverse Reinforcement Learning
    Lopes, Manuel
    Melo, Francisco
    Montesano, Luis
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT II, 2009, 5782 : 31 - +
  • [30] Learning Reward Machines for Partially Observable Reinforcement Learning
    Icarte, Rodrigo Toro
    Waldie, Ethan
    Klassen, Toryn Q.
    Valenzano, Richard
    Castro, Margarita P.
    McIlraith, Sheila A.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32