Self-supervised Learning of Contextualized Local Visual Embeddings

被引:0
|
作者
Silva, Thalles [1 ]
Pedrini, Helio [1 ]
Rivera, Adin Ramirez [2 ]
机构
[1] Univ Estadual Campinas, Inst Comp, Campinas, SP, Brazil
[2] Univ Oslo, Dept Informat, Oslo, Norway
关键词
D O I
10.1109/ICCVW60793.2023.00025
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present Contextualized Local Visual Embeddings (CLoVE), a self-supervised convolutional-based method that learns representations suited for dense prediction tasks. CLoVE deviates from current methods and optimizes a single loss function that operates at the level of contextualized local embeddings learned from output feature maps of convolution neural network (CNN) encoders. To learn contextualized embeddings, CLoVE proposes a normalized multhead self-attention layer that combines local features from different parts of an image based on similarity. We extensively benchmark CLoVE's pre-trained representations on multiple datasets. CLoVE reaches state-of-the-art performance for CNN-based architectures in 4 dense prediction downstream tasks, including object detection, instance segmentation, keypoint detection, and dense pose estimation. Code: https://github.com/sthalles/CLoVE.
引用
收藏
页码:177 / 186
页数:10
相关论文
共 50 条
  • [31] Self-Supervised Visual Representation Learning from Hierarchical Grouping
    Zhang, Xiao
    Maire, Michael
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [32] Audio-visual self-supervised representation learning: A survey
    Alsuwat, Manal
    Al-Shareef, Sarah
    Alghamdi, Manal
    NEUROCOMPUTING, 2025, 634
  • [33] Self-Supervised Visual Representations Learning by Contrastive Mask Prediction
    Zhao, Yucheng
    Wang, Guangting
    Luo, Chong
    Zeng, Wenjun
    Zha, Zheng-Jun
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 10140 - 10149
  • [34] Self-supervised learning for visual tracking and recognition of human hand
    Wu, Y
    Huang, TS
    SEVENTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-2001) / TWELFTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE (IAAI-2000), 2000, : 243 - 248
  • [35] ROLL: Visual Self-Supervised Reinforcement Learning with Object Reasoning
    Wang, Yufei
    Narasimhan, Gautham Narayan
    Lin, Xingyu
    Okorn, Brian
    Held, David
    CONFERENCE ON ROBOT LEARNING, VOL 155, 2020, 155 : 1030 - 1048
  • [36] Towards Efficient and Effective Self-supervised Learning of Visual Representations
    Addepalli, Sravanti
    Bhogale, Kaushal
    Dey, Priyam
    Babu, R. Venkatesh
    COMPUTER VISION, ECCV 2022, PT XXXI, 2022, 13691 : 523 - 538
  • [37] Self-Supervised Visual Representation Learning via Residual Momentum
    Pham, Trung Xuan
    Niu, Axi
    Zhang, Kang
    Jin, Tee Joshua Tian
    Hong, Ji Woo
    Yoo, Chang D.
    IEEE ACCESS, 2023, 11 : 116706 - 116720
  • [38] Dense Semantic Contrast for Self-Supervised Visual Representation Learning
    Li, Xiaoni
    Zhou, Yu
    Zhang, Yifei
    Zhang, Aoting
    Wang, Wei
    Jiang, Ning
    Wu, Haiying
    Wang, Weiping
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1368 - 1376
  • [39] Boost Supervised Pretraining for Visual Transfer Learning: Implications of Self-Supervised Contrastive Representation Learning
    Sun, Jinghan
    Wei, Dong
    Ma, Kai
    Wang, Liansheng
    Zheng, Yefeng
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 2307 - 2315
  • [40] Stable Contrastive Learning for Self-Supervised Sentence Embeddings With Pseudo-Siamese Mutual Learning
    Xie, Yutao
    Wu, Qiyu
    Chen, Wei
    Wang, Tengjiao
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 3046 - 3059