Self-supervised Learning of Contextualized Local Visual Embeddings

被引:0
|
作者
Silva, Thalles [1 ]
Pedrini, Helio [1 ]
Rivera, Adin Ramirez [2 ]
机构
[1] Univ Estadual Campinas, Inst Comp, Campinas, SP, Brazil
[2] Univ Oslo, Dept Informat, Oslo, Norway
关键词
D O I
10.1109/ICCVW60793.2023.00025
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present Contextualized Local Visual Embeddings (CLoVE), a self-supervised convolutional-based method that learns representations suited for dense prediction tasks. CLoVE deviates from current methods and optimizes a single loss function that operates at the level of contextualized local embeddings learned from output feature maps of convolution neural network (CNN) encoders. To learn contextualized embeddings, CLoVE proposes a normalized multhead self-attention layer that combines local features from different parts of an image based on similarity. We extensively benchmark CLoVE's pre-trained representations on multiple datasets. CLoVE reaches state-of-the-art performance for CNN-based architectures in 4 dense prediction downstream tasks, including object detection, instance segmentation, keypoint detection, and dense pose estimation. Code: https://github.com/sthalles/CLoVE.
引用
收藏
页码:177 / 186
页数:10
相关论文
共 50 条
  • [1] Self-Supervised Learning for Contextualized Extractive Summarization
    Wang, Hong
    Wang, Xin
    Xiong, Wenhan
    Yu, Mo
    Guo, Xiaoxiao
    Chang, Shiyu
    Wang, William Yang
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 2221 - 2227
  • [2] VICRegL: Self-Supervised Learning of Local Visual Features
    Bardes, Adrien
    Ponce, Jean
    LeCun, Yann
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [3] Self-supervised learning of class embeddings from video
    Wiles, Olivia
    Koepke, A. Sophia
    Zisserman, Andrew
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 3019 - 3027
  • [4] Evaluating Self-Supervised Learning for Molecular Graph Embeddings
    Wang, Hanchen
    Kaddour, Jean
    Liu, Shengchao
    Tang, Jian
    Lasenby, Joan
    Liu, Qi
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [5] Self-supervised speaker embeddings
    Stafylakis, Themos
    Rohdin, Johan
    Plchot, Oldrich
    Mizera, Petr
    Burget, Lukas
    [J]. INTERSPEECH 2019, 2019, : 2863 - 2867
  • [6] Temporally Coherent Embeddings for Self-Supervised Video Representation Learning
    Knights, Joshua
    Harwood, Ben
    Ward, Daniel
    Vanderkop, Anthony
    Mackenzie-Ross, Olivia
    Moghadam, Peyman
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 8914 - 8921
  • [7] Temporally coherent embeddings for self-supervised video representation learning
    CSIRO-Data61, Brisbane
    QLD
    4069, Australia
    不详
    QLD
    4000, Australia
    不详
    QLD
    4072, Australia
    [J]. arXiv,
  • [8] Self-supervised Learning of Visual Graph Matching
    Liu, Chang
    Zhang, Shaofeng
    Yang, Xiaokang
    Yan, Junchi
    [J]. COMPUTER VISION, ECCV 2022, PT XXIII, 2022, 13683 : 370 - 388
  • [9] Revisiting Self-Supervised Visual Representation Learning
    Kolesnikov, Alexander
    Zhai, Xiaohua
    Beyer, Lucas
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 1920 - 1929
  • [10] Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language
    Baevski, Alexei
    Babu, Arun
    Hsu, Wei-Ning
    Auli, Michael
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202