Mixed Autoencoder for Self-supervised Visual Representation Learning

被引：10

作者：

Chen, Kai ^{[1
]}

Liu, Zhili ^{[1
,2
]}

Hong, Lanqing ^{[2
]}

Xu, Hang ^{[2
]}

Li, Zhenguo ^{[2
]}

Yeung, Dit-Yan ^{[1
]}

机构：

[1] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China

[2] Huawei Noahs Ark Lab, Montreal, PQ, Canada

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2023年

关键词：

D O I：

10.1109/CVPR52729.2023.02178

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Masked Autoencoder (MAE) has demonstrated superior performance on various vision tasks via randomly masking image patches and reconstruction. However, effective data augmentation strategies for MAE still remain open questions, different from those in contrastive learning that serve as the most important part. This paper studies the prevailing mixing augmentation for MAE. We first demonstrate that naive mixing will in contrast degenerate model performance due to the increase of mutual information (MI). To address, we propose homologous recognition, an auxiliary pretext task, not only to alleviate the MI increasement by explicitly requiring each patch to recognize homologous patches, but also to perform object-aware self-supervised pre-training for better downstream dense perception performance. With extensive experiments, we demonstrate that our proposed Mixed Autoencoder (MixedAE) achieves the state-of-the-art transfer results among masked image modeling (MIM) augmentations on different downstream tasks with significant efficiency. Specifically, our MixedAE outperforms MAE by +0.3% accuracy, +1.7 mIoU and +0.9 AP on ImageNet-1K, ADE20K and COCO respectively with a standard ViT-Base. Moreover, MixedAE surpasses iBOT, a strong MIM method combined with instance discrimination, while accelerating training by 2x. To our best knowledge, this is the very first work to consider mixing for MIM from the perspective of pretext task design. Code will be made available.

引用

页码：22742 / 22751

页数：10

共 50 条

[31] Self-Distilled Self-supervised Representation Learning
Jang, Jiho
Kim, Seonhoon
Yoo, Kiyoon
Kong, Chaerin
Kim, Jangho
Kwak, Nojun
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 2828 - 2838
[32] Towards Latent Masked Image Modeling for Self-supervised Visual Representation Learning
Wei, Yibing
Gupta, Abhinav
Morgado, Pedro
COMPUTER VISION - ECCV 2024, PT XXXIX, 2025, 15097 : 1 - 17
[33] solo-learn: A Library of Self-supervised Methods for Visual Representation Learning
Turrisi da Costa, Victor G.
Fini, Enrico
Nabi, Moin
Sebe, Nicu
Ricci, Elisa
Journal of Machine Learning Research, 2022, 23
[34] Semantics-Consistent Feature Search for Self-Supervised Visual Representation Learning
Song, Kaiyou
Zhang, Shan
Luo, Zimeng
Wang, Tong
Xie, Jin
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 16053 - 16062
[35] Feature selection and cascade dimensionality reduction for self-supervised visual representation learning
Qu, Peixin
Jin, Songlin
Tian, Yongqin
Zhou, Ling
Zheng, Ying
Zhang, Weidong
Xu, Yibo
Pan, Xipeng
Zhao, Wenyi
COMPUTERS & ELECTRICAL ENGINEERING, 2023, 106
[36] Self-Supervised Representation Learning using Visual Field Expansion on Digital Pathology
Boyd, Joseph
Liashuha, Mykola
Deutsch, Eric
Paragios, Nikos
Christodoulidis, Stergios
Vakalopoulou, Maria
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 639 - 647
[37] Revitalizing CNN Attentions via Transformers in Self-Supervised Visual Representation Learning
Ge, Chongjian
Liang, Youwei
Song, Yibing
Jiao, Jianbo
Wang, Jue
Luo, Ping
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[38] Self-Supervised Audio-Visual Representation Learning for in-the-wild Videos
Feng, Zishun
Tu, Ming
Xia, Rui
Wang, Yuxuan
Krishnamurthy, Ashok
2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 5671 - 5672
[39] solo-learn: A Library of Self-supervised Methods for Visual Representation Learning
Turrisi da Costa, Victor G.
Fini, Enrico
Nabi, Moin
Sebe, Nicu
Ricci, Elisa
JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23 : 1 - 6
[40] Self-Supervised Speech Representation Learning: A Review
Mohamed, Abdelrahman
Lee, Hung-yi
Borgholt, Lasse
Havtorn, Jakob D.
Edin, Joakim
Igel, Christian
Kirchhoff, Katrin
Li, Shang-Wen
Livescu, Karen
Maaloe, Lars
Sainath, Tara N.
Watanabe, Shinji
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) : 1179 - 1210

← 1 2 3 4 5 →