Siamese Image Modeling for Self-Supervised Vision Representation Learning

被引：11

作者：

Tao, Chenxin ^{[1
]}

Zhu, Xizhou ^{[2
,4
]}

Su, Weijie ^{[3
]}

Huang, Gao ^{[1
]}

Li, Bin ^{[3
]}

Zhou, Jie ^{[1
]}

Qiao, Yu ^{[4
]}

Wang, Xiaogang ^{[5
]}

Dai, Jifeng ^{[1
,4
]}

机构：

[1] Tsinghua Univ, Beijing, Peoples R China

[2] SenseTime Res, Hong Kong, Peoples R China

[3] Univ Sci & Technol China, Hefei, Peoples R China

[4] Shanghai Artificial Intelligence Lab, Shanghai, Peoples R China

[5] Chinese Univ Hong Kong, Hong Kong, Peoples R China

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR | 2023年

基金：

国家重点研发计划; 中国国家自然科学基金;

关键词：

D O I：

10.1109/CVPR52729.2023.00212

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Self-supervised learning (SSL) has delivered superior performance on a variety of downstream vision tasks. Two main-stream SSL frameworks have been proposed, i.e., Instance Discrimination (ID) and Masked Image Modeling (MIM). ID pulls together representations from different views of the same image, while avoiding feature collapse. It lacks spatial sensitivity, which requires modeling the local structure within each image. On the other hand, MIM reconstructs the original content given a masked image. It instead does not have good semantic alignment, which requires projecting semantically similar views into nearby representations. To address this dilemma, we observe that (1) semantic alignment can be achieved by matching different image views with strong augmentations; (2) spatial sensitivity can benefit from predicting dense representations with masked images. Driven by these analysis, we propose Siamese Image Modeling (SiameseIM), which predicts the dense representations of an augmented view, based on another masked view from the same image but with different augmentations. SiameseIM uses a Siamese network with two branches. The online branch encodes the first view, and predicts the second view's representation according to the relative positions between these two views. The target branch produces the target by encoding the second view. SiameseIM can surpass both ID and MIM on a wide range of downstream tasks, including ImageNet finetuning and linear probing, COCO and LVIS detection, and ADE20k semantic segmentation. The improvement is more significant in few-shot, long-tail and robustness-concerned scenarios. Code shall be released.

引用

页码：2132 / 2141

页数：10

共 50 条

[1] Self-Supervised Representation Learning for Document Image Classification
Siddiqui, Shoaib Ahmed
Dengel, Andreas
Ahmed, Sheraz
IEEE ACCESS, 2021, 9 : 164358 - 164367
[2] Siamese Network Based Multiscale Self-Supervised Heterogeneous Graph Representation Learning
Chen, Zijun
Luo, Lihui
Li, Xunkai
Jiang, Bin
Guo, Qiang
Wang, Chunpeng
IEEE ACCESS, 2022, 10 : 98490 - 98500
[3] Self-Supervised Image Representation Learning with Geometric Set Consistency
Chen, Nenglun
Chu, Lei
Pan, Hao
Lu, Yan
Wang, Wenping
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 19270 - 19280
[4] Multi-Scale Contrastive Siamese Networks for Self-Supervised Graph Representation Learning
Jin, Ming
Zheng, Yizhen
Li, Yuan-Fang
Gong, Chen
Zhou, Chuan
Pan, Shirui
PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 1477 - 1483
[5] Patch-level Representation Learning for Self-supervised Vision Transformers
Yun, Sukmin
Lee, Hankook
Kim, Jaehyung
Shin, Jinwoo
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 8344 - 8353
[6] MultiSiam: Self-supervised Multi-instance Siamese Representation Learning for Autonomous Driving
Chen, Kai
Hong, Lanqing
Xu, Hang
Li, Zhenguo
Yeung, Dit-Yan
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 7526 - 7534
[7] Group-based siamese self-supervised learning
Li, Zhongnian
Wang, Jiayu
Geng, Qingcong
Xu, Xinzheng
ELECTRONIC RESEARCH ARCHIVE, 2024, 32 (08): : 4913 - 4925
[8] Whitening for Self-Supervised Representation Learning
Ermolov, Aleksandr
Siarohin, Aliaksandr
Sangineto, Enver
Sebe, Nicu
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[9] Self-Supervised Representation Learning for CAD
Jones, Benjamin T.
Hu, Michael
Kodnongbua, Milin
Kim, Vladimir G.
Schulz, Adriana
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 21327 - 21336
[10] Self-supervised Siamese Autoencoders
Baier, Friederike
Mair, Sebastian
Fadel, Samuel G.
ADVANCES IN INTELLIGENT DATA ANALYSIS XXII, PT I, IDA 2024, 2024, 14641 : 117 - 128

← 1 2 3 4 5 →