Image Retrieval Based on Vision Transformer and Masked Learning

被引:5
|
作者
李锋 [1 ]
潘煌圣 [1 ]
盛守祥 [2 ]
王国栋 [2 ]
机构
[1] College of Computer Science and Technology,Donghua University
[2] Huafang Co.,Ltd.
关键词
D O I
10.19884/j.1672-5220.202301003
中图分类号
TP391.41 [];
学科分类号
080203 ;
摘要
Deep convolutional neural networks(DCNNs) are widely used in content-based image retrieval(CBIR) because of the advantages in image feature extraction. However, the training of deep neural networks requires a large number of labeled data, which limits the application. Self-supervised learning is a more general approach in unlabeled scenarios. A method of fine-tuning feature extraction networks based on masked learning is proposed. Masked autoencoders(MAE) are used in the fine-tune vision transformer(ViT) model. In addition, the scheme of extracting image descriptors is discussed. The encoder of the MAE uses the ViT to extract global features and performs self-supervised fine-tuning by reconstructing masked area pixels. The method works well on category-level image retrieval datasets with marked improvements in instance-level datasets. For the instance-level datasets Oxford5k and Paris6k, the retrieval accuracy of the base model is improved by 7% and 17% compared to that of the original model, respectively.
引用
收藏
页码:539 / 547
页数:9
相关论文
共 50 条
  • [41] Bearing Fault Diagnosis Based on Image Information Fusion and Vision Transformer Transfer Learning Model
    Zhang, Zichen
    Li, Jing
    Cai, Chaozhi
    Ren, Jianhua
    Xue, Yingfang
    [J]. APPLIED SCIENCES-BASEL, 2024, 14 (07):
  • [42] MS-DINO: Masked Self-Supervised Distributed Learning Using Vision Transformer
    Park, Sangjoon
    Lee, Ik Jae
    Kim, Jun Won
    Chul Ye, Jong
    [J]. IEEE Journal of Biomedical and Health Informatics, 2024, 28 (10) : 6180 - 6192
  • [43] Image retrieval based on similarity learning
    El-Naqa, I
    Wernick, MN
    Yang, YY
    Galatsanos, NP
    [J]. 2000 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL III, PROCEEDINGS, 2000, : 722 - 725
  • [44] Spectral-Spatial Masked Transformer With Supervised and Contrastive Learning for Hyperspectral Image Classification
    Huang, Lingbo
    Chen, Yushi
    He, Xin
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [45] Image Retrieval Method Based on Vision Feature of Color
    Dai, Yingmeng
    Wei, Linfeng
    Luo, Cong
    [J]. SENSORS, MEASUREMENT AND INTELLIGENT MATERIALS, PTS 1-4, 2013, 303-306 : 1406 - +
  • [46] Task-Agnostic Vision Transformer for Distributed Learning of Image Processing
    Kim, Boah
    Kim, Jeongsol
    Ye, Jong Chul
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 203 - 218
  • [47] Vision Transformer With Contrastive Learning for Remote Sensing Image Scene Classification
    Bi, Meiqiao
    Wang, Minghua
    Li, Zhi
    Hong, Danfeng
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2023, 16 : 738 - 749
  • [48] Video captioning based on vision transformer and reinforcement learning
    Zhao, Hong
    Chen, Zhiwen
    Guo, Lan
    Han, Zeyu
    [J]. PEERJ COMPUTER SCIENCE, 2022, 8
  • [49] Video captioning based on vision transformer and reinforcement learning
    Zhao H.
    Chen Z.
    Guo L.
    Han Z.
    [J]. PeerJ Computer Science, 2022, 8
  • [50] Transformer-Based Masked Autoencoder With Contrastive Loss for Hyperspectral Image Classification
    Cao, Xianghai
    Lin, Haifeng
    Guo, Shuaixu
    Xiong, Tao
    Jiao, Licheng
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61