Image Retrieval Based on Vision Transformer and Masked Learning

被引:5
|
作者
李锋 [1 ]
潘煌圣 [1 ]
盛守祥 [2 ]
王国栋 [2 ]
机构
[1] College of Computer Science and Technology,Donghua University
[2] Huafang Co.,Ltd.
关键词
D O I
10.19884/j.1672-5220.202301003
中图分类号
TP391.41 [];
学科分类号
080203 ;
摘要
Deep convolutional neural networks(DCNNs) are widely used in content-based image retrieval(CBIR) because of the advantages in image feature extraction. However, the training of deep neural networks requires a large number of labeled data, which limits the application. Self-supervised learning is a more general approach in unlabeled scenarios. A method of fine-tuning feature extraction networks based on masked learning is proposed. Masked autoencoders(MAE) are used in the fine-tune vision transformer(ViT) model. In addition, the scheme of extracting image descriptors is discussed. The encoder of the MAE uses the ViT to extract global features and performs self-supervised fine-tuning by reconstructing masked area pixels. The method works well on category-level image retrieval datasets with marked improvements in instance-level datasets. For the instance-level datasets Oxford5k and Paris6k, the retrieval accuracy of the base model is improved by 7% and 17% compared to that of the original model, respectively.
引用
收藏
页码:539 / 547
页数:9
相关论文
共 50 条
  • [1] HashFormer: Vision Transformer Based Deep Hashing for Image Retrieval
    Li, Tao
    Zhang, Zheng
    Pei, Lishen
    Gan, Yan
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 827 - 831
  • [2] Hash Food Image Retrieval Based on Enhanced Vision Transformer
    Cao, Pindan
    Min, Weiqing
    Song, Jiajun
    Sheng, Guorui
    Yang, Yancun
    Wang, Lili
    Jiang, Shuqiang
    [J]. Shipin Kexue/Food Science, 2024, 45 (10): : 1 - 8
  • [3] Green Hierarchical Vision Transformer for Masked Image Modeling
    Huang, Lang
    You, Shan
    Zheng, Mingkai
    Wang, Fei
    Qian, Chen
    Yamasaki, Toshihiko
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [4] Joint Learning Method Based On Transformer For Image Retrieval
    Wei, Hongxi
    He, Chao
    [J]. 2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [5] Contrastive hashing with vision transformer for image retrieval
    Ren, Xiuxiu
    Zheng, Xiangwei
    Zhou, Huiyu
    Liu, Weilong
    Dong, Xiao
    [J]. INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2022, 37 (12) : 12192 - 12211
  • [6] VTHSC-MIR: Vision Transformer Hashing with Supervised Contrastive learning based medical image retrieval
    Kumar, Mehul
    Singh, Rhythumwinder
    Mukherjee, Prerana
    [J]. PATTERN RECOGNITION LETTERS, 2024, 184 : 28 - 36
  • [7] Transformer-Based Distillation Hash Learning for Image Retrieval
    Lv, Yuanhai
    Wang, Chongyan
    Yuan, Wanteng
    Qian, Xiaohao
    Yang, Wujun
    Zhao, Wanqing
    [J]. ELECTRONICS, 2022, 11 (18)
  • [8] Investigating the Vision Transformer Model for Image Retrieval Tasks
    Gkelios, Socratis
    Boutalis, Yiannis
    Chatzichristofis, Savvas A.
    [J]. 17TH ANNUAL INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING IN SENSOR SYSTEMS (DCOSS 2021), 2021, : 367 - 373
  • [9] Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection
    Fang, Yuxin
    Yang, Shusheng
    Wang, Shijie
    Ge, Yixiao
    Shan, Ying
    Wang, Xinggang
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 6221 - 6230
  • [10] Vision Transformer-Based Ensemble Learning for Hyperspectral Image Classification
    Liu, Jun
    Guo, Haoran
    He, Yile
    Li, Huali
    [J]. REMOTE SENSING, 2023, 15 (21)