Contrastive Retrospection: honing in on critical steps for rapid learning and generalization in RL

被引：0

作者：

Sun, Chen ^{[1
]}

Yang, Wannan ^{[2
]}

Jiralerspong, Thomas ^{[1
]}

Malenfant, Dane ^{[3
]}

Alsbury-Nealy, Benjamin ^{[4
,5
]}

Bengio, Yoshua ^{[1
,6
]}

Richards, Blake ^{[1
,7
]}

机构：

[1] Univ Montreal, Mila, Montreal, PQ, Canada

[2] NYU, New York, NY 10003 USA

[3] McGill Univ, Montreal, PQ, Canada

[4] Univ Toronto, Toronto, ON, Canada

[5] SilicoLabs Inc, Austin, TX USA

[6] CIFAR, Toronto, ON, Canada

[7] CIFAR, Learning Machines & Brains, Toronto, ON, Canada

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年

基金：

加拿大自然科学与工程研究理事会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In real life, success is often contingent upon multiple critical steps that are distant in time from each other and from the final reward. These critical steps are challenging to identify with traditional reinforcement learning (RL) methods that rely on the Bellman equation for credit assignment. Here, we present a new RL algorithm that uses offline contrastive learning to hone in on these critical steps. This algorithm, which we call Contrastive Retrospection (ConSpec), can be added to any existing RL algorithm. ConSpec learns a set of prototypes for the critical steps in a task by a novel contrastive loss and delivers an intrinsic reward when the current state matches one of the prototypes. The prototypes in ConSpec provide two key benefits for credit assignment: (i) They enable rapid identification of all the critical steps. (ii) They do so in a readily interpretable manner, enabling out-of-distribution generalization when sensory features are altered. Distinct from other contemporary RL approaches to credit assignment, ConSpec takes advantage of the fact that it is easier to retrospectively identify the small set of steps that success is contingent upon (and ignoring other states) than it is to prospectively predict reward at every taken step. ConSpec greatly improves learning in a diverse set of RL tasks. The code is available at the link: https://github.com/sunchipsster1/ConSpec.

引用

页数：23

共 50 条

[1] Generalization Bounds for Adversarial Contrastive Learning
Zou, Xin
Liu, Weiwei
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24 : 1 - 54
[2] Learning and generalization of novel contrastive cues
Sumner, Meghan
[J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 396 - 399
[3] Instance Paradigm Contrastive Learning for Domain Generalization
Chen, Zining
Wang, Weiqiu
Zhao, Zhicheng
Su, Fei
Men, Aidong
Dong, Yuan
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (02) : 1032 - 1042
[4] Domain generalization for mammographic image analysis with contrastive learning
Li, Zheren
Cui, Zhiming
Zhang, Lichi
Wang, Sheng
Lei, Chenjin
Ouyang, Xi
Chen, Dongdong
Zhao, Xiangyu
Liu, Chunling
Liu, Zaiyi
Gu, Yajia
Shen, Dinggang
Cheng, Jie-Zhi
[J]. Computers in Biology and Medicine, 2025, 185
[5] Contrastive Value Learning: Implicit Models for Simple Offline RL
Mazoure, Bogdan
Eysenbach, Benjamin
Nachum, Ofir
Tompson, Jonathan
[J]. CONFERENCE ON ROBOT LEARNING, VOL 229, 2023, 229
[6] Rapid generalization in phonotactic learning
Linzen, Tal
Gallagher, Gillian
[J]. LABORATORY PHONOLOGY, 2017, 8 (01):
[7] Enhancing EEG Domain Generalization via Weighted Contrastive Learning
Jo, Sangmin
Jeong, Seungwoo
Jeon, Jaehyun
Suk, Heung-Il
[J]. 2024 12TH INTERNATIONAL WINTER CONFERENCE ON BRAIN-COMPUTER INTERFACE, BCI 2024, 2024,
[8] CL3: Generalization of Contrastive Loss for Lifelong Learning
Roy, Kaushik
Simon, Christian
Moghadam, Peyman
Harandi, Mehrtash
[J]. JOURNAL OF IMAGING, 2023, 9 (12)
[9] PCL: Proxy-based Contrastive Learning for Domain Generalization
Yao, Xufeng
Bai, Yang
Zhang, Xinyun
Zhang, Yuechen
Sun, Qi
Chen, Ran
Li, Ruiyu
Yu, Bei
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 7087 - 7097
[10] RL-ViGen: A Reinforcement Learning Benchmark for Visual Generalization
Yuan, Zhecheng
Yang, Sizhe
Hua, Pu
Chang, Can
Hu, Kaizhe
Xu, Huazhe
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,

← 1 2 3 4 5 →