Reinforcement learning and the reward positivity with aversive outcomes

被引：0

作者：

Bauer, Elizabeth A. ^{[1
,2
]}

Watanabe, Brandon K. ^{[1
]}

Macnamara, Annmarie ^{[1
]}

机构：

[1] Texas A&M Univ, Dept Psychol & Brain Sci, College Stn, TX USA

[2] Texas A&M Univ, Dept Psychol & Brain Sci, 4235 TAMU, College Stn, TX 77843 USA

来源：

PSYCHOPHYSIOLOGY | 2024年 / 61卷 / 04期

关键词：

ERP; punishment; reinforcement learning; reward positivity (RewP); PREDICTION ERROR; DOPAMINE; FEEDBACK; ERP; METAANALYSIS; POTENTIALS; NEGATIVITY; P300; PCA;

D O I：

10.1111/psyp.14460

中图分类号：

B84 [心理学];

学科分类号：

04 ; 0402 ;

摘要：

The reinforcement learning (RL) theory of the reward positivity (RewP), an event-related potential (ERP) component that measures reward responsivity, suggests that the RewP should be largest when positive outcomes are unexpected and has been supported by work using appetitive outcomes (e.g., money). However, the RewP can also be elicited by the absence of aversive outcomes (e.g., shock). The limited work to-date that has manipulated expectancy while using aversive outcomes has not supported the predictions of RL theory. Nonetheless, this work has been difficult to reconcile with the appetitive literature because the RewP was not observed as a reward signal in these studies, which used passive tasks that did not involve participant choice. Here, we tested the predictions of the RL theory by manipulating expectancy in an active/choice-based threat-of-shock doors task that was previously found to elicit the RewP as a reward signal. Moreover, we used principal components analysis to isolate the RewP from overlapping ERP components. Eighty participants viewed pairs of doors surrounded by a red or green border; shock delivery was expected (80%) following red-bordered doors and unexpected (20%) following green-bordered doors. The RewP was observed as a reward signal (i.e., no shock > shock) that was not potentiated for unexpected feedback. In addition, the RewP was larger overall for unexpected (vs expected) feedback. Therefore, the RewP appears to reflect the additive (not interactive) effects of reward and expectancy, challenging the RL theory of the RewP, at least when reward is defined as the absence of an aversive outcome.

引用

页数：8

共 50 条

[21] Distributional Reward Decomposition for Reinforcement Learning
Lin, Zichuan
Zhao, Li
Yang, Derek
Qin, Tao
Yang, Guangwen
Liu, Tie-Yan
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[22] Hierarchical average reward reinforcement learning
Department of Computing Science, University of Alberta, Edmonton, Alta. T6G 2E8, Canada
不详
[J]. Journal of Machine Learning Research, 2007, 8 : 2629 - 2669
[23] Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning
Icarte, Rodrigo Toro
Klassen, Toryn Q.
Valenzano, Richard
McIlraith, Sheila A.
[J]. Journal of Artificial Intelligence Research, 2022, 73 : 173 - 208
[24] Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning
Icarte, Rodrigo Toro
Klassen, Toryn Q.
Valenzano, Richard
Mcllraith, Sheila A.
[J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2022, 73 : 173 - 208
[25] Positivity and reward
Baverstock, Anna
Finlay, Fiona
[J]. ARCHIVES OF DISEASE IN CHILDHOOD-EDUCATION AND PRACTICE EDITION, 2019, 104 (04): : 182 - 182
[26] Neural mechanisms of the nucleus accumbens circuit in reward and aversive learning
Hikida, Takatoshi
Morita, Makiko
Macpherson, Tom
[J]. NEUROSCIENCE RESEARCH, 2016, 108 : 1 - 5
[27] Actively learning costly reward functions for reinforcement learning
Eberhard, Andre
Metni, Houssam
Fahland, Georg
Stroh, Alexander
Friederich, Pascal
[J]. MACHINE LEARNING-SCIENCE AND TECHNOLOGY, 2024, 5 (01):
[28] Learning classifier system with average reward reinforcement learning
Zang, Zhaoxiang
Li, Dehua
Wang, Junying
Xia, Dan
[J]. KNOWLEDGE-BASED SYSTEMS, 2013, 40 : 58 - 71
[29] Active Learning for Reward Estimation in Inverse Reinforcement Learning
Lopes, Manuel
Melo, Francisco
Montesano, Luis
[J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT II, 2009, 5782 : 31 - +
[30] Learning Reward Machines for Partially Observable Reinforcement Learning
Icarte, Rodrigo Toro
Waldie, Ethan
Klassen, Toryn Q.
Valenzano, Richard
Castro, Margarita P.
McIlraith, Sheila A.
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32

← 1 2 3 4 5 →