The gradient of the reinforcement landscape influences sensorimotor learning

被引：32

作者：

Cashaback, Joshua G. A. ^{[1
,2
]}

Lao, Christopher K. ^{[3
]}

Palidis, Dimitrios J. ^{[4
,5
,6
]}

Coltman, Susan K. ^{[4
,5
,6
]}

McGregor, Heather R. ^{[4
,5
,6
]}

Gribble, Paul L. ^{[3
,5
,6
,7
]}

机构：

[1] Univ Calgary, Human Performance Lab, Calgary, AB, Canada

[2] Univ Calgary, Hotchkiss Brain Inst, Calgary, AB, Canada

[3] Western Univ, Dept Physiol & Pharmacol, London, ON, Canada

[4] Western Univ, Grad Program Neurosci, London, ON, Canada

[5] Western Univ, Brain & Mind Inst, London, ON, Canada

[6] Western Univ, Dept Psychol, London, ON, Canada

[7] Haskins Labs Inc, New Haven, CT 06511 USA

来源：

PLOS COMPUTATIONAL BIOLOGY | 2019年 / 15卷 / 03期

基金：

加拿大自然科学与工程研究理事会;

关键词：

TASK-IRRELEVANT; DECISION-THEORY; MOTOR; ADAPTATION; MOVEMENT; VARIABILITY; REWARD; REPRESENTATION; MEMORY;

D O I：

10.1371/journal.pcbi.1006839

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Consideration of previous successes and failures is essential to mastering a motor skill. Much of what we know about how humans and animals learn from such reinforcement feedback comes from experiments that involve sampling from a small number of discrete actions. Yet, it is less understood how we learn through reinforcement feedback when sampling from a continuous set of possible actions. Navigating a continuous set of possible actions likely requires using gradient information to maximize success. Here we addressed how humans adapt the aim of their hand when experiencing reinforcement feedback that was associated with a continuous set of possible actions. Specifically, we manipulated the change in the probability of reward given a change in motor actionthe reinforcement gradientto study its influence on learning. We found that participants learned faster when exposed to a steep gradient compared to a shallow gradient. Further, when initially positioned between a steep and a shallow gradient that rose in opposite directions, participants were more likely to ascend the steep gradient. We introduce a model that captures our results and several features of motor learning. Taken together, our work suggests that the sensorimotor system relies on temporally recent and spatially local gradient information to drive learning. Author summary In recent years it has been shown that reinforcement feedback may also subserve our ability to acquire new motor skills. Here we address how the reinforcement gradient influences motor learning. We found that a steeper gradient increased both the rate and likelihood of learning. Moreover, while many mainstream theories posit that we build a full representation of the reinforcement landscape, both our data and model suggest that the sensorimotor system relies primarily on temporally recent and spatially local gradient information to drive learning. Our work provides new insights into how we sample from a continuous action-reward landscape to maximize success.

引用

页数：27

共 50 条

[31] Evolution-Guided Policy Gradient in Reinforcement Learning
Khadka, Shauharda
Tumer, Kagan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[32] Policy Gradient using Weak Derivatives for Reinforcement Learning
Bhatt, Sujay
Koppel, Alec
Krishnamurthy, Vikram
2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 5531 - 5537
[33] Total stochastic gradient algorithms and applications in reinforcement learning
Parmas, Paavo
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[34] Variance reduction techniques for gradient estimates in reinforcement learning
Greensmith, E
Bartlett, PL
Baxter, J
JOURNAL OF MACHINE LEARNING RESEARCH, 2004, 5 : 1471 - 1530
[35] Variance reduction techniques for gradient estimates in reinforcement learning
Greensmith, E
Bartlett, PL
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 14, VOLS 1 AND 2, 2002, 14 : 1507 - 1514
[36] MQGrad: Reinforcement Learning of Gradient Quantization in Parameter Server
Cui, Guoxin
Xu, Jun
Zeng, Wei
Lan, Yanyan
Guo, Jiafeng
Cheng, Xueqi
PROCEEDINGS OF THE 2018 ACM SIGIR INTERNATIONAL CONFERENCE ON THEORY OF INFORMATION RETRIEVAL (ICTIR'18), 2018, : 83 - 90
[37] Inverse Reinforcement Learning through Policy Gradient Minimization
Pirotta, Matteo
Restelli, Marcello
THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 1993 - 1999
[38] Variance reduction techniques for gradient estimates in reinforcement learning
Greensmith, Evan
Bartlett, Peter L.
Baxter, Jonathan
Journal of Machine Learning Research, 2004, 5 : 1471 - 1530
[39] Policy gradient methods for reinforcement learning with function approximation
Sutton, RS
McAllester, D
Singh, S
Mansour, Y
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 12, 2000, 12 : 1057 - 1063
[40] Fuzzy Baselines to Stabilize Policy Gradient Reinforcement Learning
Surita, Gabriela
Lemos, Andre
Gomide, Fernando
EXPLAINABLE AI AND OTHER APPLICATIONS OF FUZZY TECHNIQUES, NAFIPS 2021, 2022, 258 : 436 - 446

← 1 2 3 4 5 →