Contrastive Retrospection: honing in on critical steps for rapid learning and generalization in RL

被引:0
|
作者
Sun, Chen [1 ]
Yang, Wannan [2 ]
Jiralerspong, Thomas [1 ]
Malenfant, Dane [3 ]
Alsbury-Nealy, Benjamin [4 ,5 ]
Bengio, Yoshua [1 ,6 ]
Richards, Blake [1 ,7 ]
机构
[1] Univ Montreal, Mila, Montreal, PQ, Canada
[2] NYU, New York, NY 10003 USA
[3] McGill Univ, Montreal, PQ, Canada
[4] Univ Toronto, Toronto, ON, Canada
[5] SilicoLabs Inc, Austin, TX USA
[6] CIFAR, Toronto, ON, Canada
[7] CIFAR, Learning Machines & Brains, Toronto, ON, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In real life, success is often contingent upon multiple critical steps that are distant in time from each other and from the final reward. These critical steps are challenging to identify with traditional reinforcement learning (RL) methods that rely on the Bellman equation for credit assignment. Here, we present a new RL algorithm that uses offline contrastive learning to hone in on these critical steps. This algorithm, which we call Contrastive Retrospection (ConSpec), can be added to any existing RL algorithm. ConSpec learns a set of prototypes for the critical steps in a task by a novel contrastive loss and delivers an intrinsic reward when the current state matches one of the prototypes. The prototypes in ConSpec provide two key benefits for credit assignment: (i) They enable rapid identification of all the critical steps. (ii) They do so in a readily interpretable manner, enabling out-of-distribution generalization when sensory features are altered. Distinct from other contemporary RL approaches to credit assignment, ConSpec takes advantage of the fact that it is easier to retrospectively identify the small set of steps that success is contingent upon (and ignoring other states) than it is to prospectively predict reward at every taken step. ConSpec greatly improves learning in a diverse set of RL tasks. The code is available at the link: https://github.com/sunchipsster1/ConSpec.
引用
收藏
页数:23
相关论文
共 50 条
  • [1] Generalization Bounds for Adversarial Contrastive Learning
    Zou, Xin
    Liu, Weiwei
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24 : 1 - 54
  • [2] Learning and generalization of novel contrastive cues
    Sumner, Meghan
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 396 - 399
  • [3] Instance Paradigm Contrastive Learning for Domain Generalization
    Chen, Zining
    Wang, Weiqiu
    Zhao, Zhicheng
    Su, Fei
    Men, Aidong
    Dong, Yuan
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (02) : 1032 - 1042
  • [4] Domain generalization for mammographic image analysis with contrastive learning
    Li, Zheren
    Cui, Zhiming
    Zhang, Lichi
    Wang, Sheng
    Lei, Chenjin
    Ouyang, Xi
    Chen, Dongdong
    Zhao, Xiangyu
    Liu, Chunling
    Liu, Zaiyi
    Gu, Yajia
    Shen, Dinggang
    Cheng, Jie-Zhi
    [J]. Computers in Biology and Medicine, 2025, 185
  • [5] Contrastive Value Learning: Implicit Models for Simple Offline RL
    Mazoure, Bogdan
    Eysenbach, Benjamin
    Nachum, Ofir
    Tompson, Jonathan
    [J]. CONFERENCE ON ROBOT LEARNING, VOL 229, 2023, 229
  • [6] Rapid generalization in phonotactic learning
    Linzen, Tal
    Gallagher, Gillian
    [J]. LABORATORY PHONOLOGY, 2017, 8 (01):
  • [7] Enhancing EEG Domain Generalization via Weighted Contrastive Learning
    Jo, Sangmin
    Jeong, Seungwoo
    Jeon, Jaehyun
    Suk, Heung-Il
    [J]. 2024 12TH INTERNATIONAL WINTER CONFERENCE ON BRAIN-COMPUTER INTERFACE, BCI 2024, 2024,
  • [8] CL3: Generalization of Contrastive Loss for Lifelong Learning
    Roy, Kaushik
    Simon, Christian
    Moghadam, Peyman
    Harandi, Mehrtash
    [J]. JOURNAL OF IMAGING, 2023, 9 (12)
  • [9] PCL: Proxy-based Contrastive Learning for Domain Generalization
    Yao, Xufeng
    Bai, Yang
    Zhang, Xinyun
    Zhang, Yuechen
    Sun, Qi
    Chen, Ran
    Li, Ruiyu
    Yu, Bei
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 7087 - 7097
  • [10] RL-ViGen: A Reinforcement Learning Benchmark for Visual Generalization
    Yuan, Zhecheng
    Yang, Sizhe
    Hua, Pu
    Chang, Can
    Hu, Kaizhe
    Xu, Huazhe
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,