Image Retrieval Based on Vision Transformer and Masked Learning

被引：5

作者：

李锋 ^{[1
]}

潘煌圣 ^{[1
]}

盛守祥 ^{[2
]}

王国栋 ^{[2
]}

机构：

[1] College of Computer Science and Technology,Donghua University

[2] Huafang Co.,Ltd.

来源：

Journal of Donghua University(English Edition) | 2023年 / 40卷 / 05期

关键词：

D O I：

10.19884/j.1672-5220.202301003

中图分类号：

TP391.41 [];

学科分类号：

080203 ;

摘要：

Deep convolutional neural networks(DCNNs) are widely used in content-based image retrieval(CBIR) because of the advantages in image feature extraction. However, the training of deep neural networks requires a large number of labeled data, which limits the application. Self-supervised learning is a more general approach in unlabeled scenarios. A method of fine-tuning feature extraction networks based on masked learning is proposed. Masked autoencoders(MAE) are used in the fine-tune vision transformer(ViT) model. In addition, the scheme of extracting image descriptors is discussed. The encoder of the MAE uses the ViT to extract global features and performs self-supervised fine-tuning by reconstructing masked area pixels. The method works well on category-level image retrieval datasets with marked improvements in instance-level datasets. For the instance-level datasets Oxford5k and Paris6k, the retrieval accuracy of the base model is improved by 7% and 17% compared to that of the original model, respectively.

引用

页码：539 / 547

页数：9

共 50 条

[1] HashFormer: Vision Transformer Based Deep Hashing for Image Retrieval
Li, Tao
Zhang, Zheng
Pei, Lishen
Gan, Yan
[J]. IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 827 - 831
[2] Hash Food Image Retrieval Based on Enhanced Vision Transformer
Cao, Pindan
Min, Weiqing
Song, Jiajun
Sheng, Guorui
Yang, Yancun
Wang, Lili
Jiang, Shuqiang
[J]. Shipin Kexue/Food Science, 2024, 45 (10): : 1 - 8
[3] Green Hierarchical Vision Transformer for Masked Image Modeling
Huang, Lang
You, Shan
Zheng, Mingkai
Wang, Fei
Qian, Chen
Yamasaki, Toshihiko
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[4] Joint Learning Method Based On Transformer For Image Retrieval
Wei, Hongxi
He, Chao
[J]. 2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
[5] Contrastive hashing with vision transformer for image retrieval
Ren, Xiuxiu
Zheng, Xiangwei
Zhou, Huiyu
Liu, Weilong
Dong, Xiao
[J]. INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2022, 37 (12) : 12192 - 12211
[6] VTHSC-MIR: Vision Transformer Hashing with Supervised Contrastive learning based medical image retrieval
Kumar, Mehul
Singh, Rhythumwinder
Mukherjee, Prerana
[J]. PATTERN RECOGNITION LETTERS, 2024, 184 : 28 - 36
[7] Transformer-Based Distillation Hash Learning for Image Retrieval
Lv, Yuanhai
Wang, Chongyan
Yuan, Wanteng
Qian, Xiaohao
Yang, Wujun
Zhao, Wanqing
[J]. ELECTRONICS, 2022, 11 (18)
[8] Investigating the Vision Transformer Model for Image Retrieval Tasks
Gkelios, Socratis
Boutalis, Yiannis
Chatzichristofis, Savvas A.
[J]. 17TH ANNUAL INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING IN SENSOR SYSTEMS (DCOSS 2021), 2021, : 367 - 373
[9] Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection
Fang, Yuxin
Yang, Shusheng
Wang, Shijie
Ge, Yixiao
Shan, Ying
Wang, Xinggang
[J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 6221 - 6230
[10] Vision Transformer-Based Ensemble Learning for Hyperspectral Image Classification
Liu, Jun
Guo, Haoran
He, Yile
Li, Huali
[J]. REMOTE SENSING, 2023, 15 (21)

← 1 2 3 4 5 →