Image Retrieval Based on Vision Transformer and Masked Learning

被引：5

作者：

李锋 ^{[1
]}

潘煌圣 ^{[1
]}

盛守祥 ^{[2
]}

王国栋 ^{[2
]}

机构：

[1] College of Computer Science and Technology,Donghua University

[2] Huafang Co.,Ltd.

来源：

Journal of Donghua University(English Edition) | 2023年 / 40卷 / 05期

关键词：

D O I：

10.19884/j.1672-5220.202301003

中图分类号：

TP391.41 [];

学科分类号：

080203 ;

摘要：

Deep convolutional neural networks(DCNNs) are widely used in content-based image retrieval(CBIR) because of the advantages in image feature extraction. However, the training of deep neural networks requires a large number of labeled data, which limits the application. Self-supervised learning is a more general approach in unlabeled scenarios. A method of fine-tuning feature extraction networks based on masked learning is proposed. Masked autoencoders(MAE) are used in the fine-tune vision transformer(ViT) model. In addition, the scheme of extracting image descriptors is discussed. The encoder of the MAE uses the ViT to extract global features and performs self-supervised fine-tuning by reconstructing masked area pixels. The method works well on category-level image retrieval datasets with marked improvements in instance-level datasets. For the instance-level datasets Oxford5k and Paris6k, the retrieval accuracy of the base model is improved by 7% and 17% compared to that of the original model, respectively.

引用

页码：539 / 547

页数：9

共 50 条

[41] Bearing Fault Diagnosis Based on Image Information Fusion and Vision Transformer Transfer Learning Model
Zhang, Zichen
Li, Jing
Cai, Chaozhi
Ren, Jianhua
Xue, Yingfang
[J]. APPLIED SCIENCES-BASEL, 2024, 14 (07):
[42] MS-DINO: Masked Self-Supervised Distributed Learning Using Vision Transformer
Park, Sangjoon
Lee, Ik Jae
Kim, Jun Won
Chul Ye, Jong
[J]. IEEE Journal of Biomedical and Health Informatics, 2024, 28 (10) : 6180 - 6192
[43] Image retrieval based on similarity learning
El-Naqa, I
Wernick, MN
Yang, YY
Galatsanos, NP
[J]. 2000 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL III, PROCEEDINGS, 2000, : 722 - 725
[44] Spectral-Spatial Masked Transformer With Supervised and Contrastive Learning for Hyperspectral Image Classification
Huang, Lingbo
Chen, Yushi
He, Xin
[J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
[45] Image Retrieval Method Based on Vision Feature of Color
Dai, Yingmeng
Wei, Linfeng
Luo, Cong
[J]. SENSORS, MEASUREMENT AND INTELLIGENT MATERIALS, PTS 1-4, 2013, 303-306 : 1406 - +
[46] Task-Agnostic Vision Transformer for Distributed Learning of Image Processing
Kim, Boah
Kim, Jeongsol
Ye, Jong Chul
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 203 - 218
[47] Vision Transformer With Contrastive Learning for Remote Sensing Image Scene Classification
Bi, Meiqiao
Wang, Minghua
Li, Zhi
Hong, Danfeng
[J]. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2023, 16 : 738 - 749
[48] Video captioning based on vision transformer and reinforcement learning
Zhao, Hong
Chen, Zhiwen
Guo, Lan
Han, Zeyu
[J]. PEERJ COMPUTER SCIENCE, 2022, 8
[49] Video captioning based on vision transformer and reinforcement learning
Zhao H.
Chen Z.
Guo L.
Han Z.
[J]. PeerJ Computer Science, 2022, 8
[50] Transformer-Based Masked Autoencoder With Contrastive Loss for Hyperspectral Image Classification
Cao, Xianghai
Lin, Haifeng
Guo, Shuaixu
Xiong, Tao
Jiao, Licheng
[J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61

← 1 2 3 4 5 →