Deep Self-Supervised Hashing With Fine-Grained Similarity Mining for Cross-Modal Retrieval

被引:0
|
作者
Han, Lijun [1 ]
Wang, Renlin [1 ]
Chen, Chunlei [1 ]
Zhang, Huihui [1 ]
Zhang, Yujie [2 ]
Zhang, Wenfeng [3 ]
机构
[1] Weifang Univ, Sch Comp Engn, Weifang 261061, Peoples R China
[2] Weifang Peoples Hosp, Weifang 261061, Peoples R China
[3] Chongqing Normal Univ, Coll Comp & Informat Sci, Chongqing 401331, Peoples R China
关键词
Semantics; Dictionaries; Transformers; Annotations; Data mining; Binary codes; Training; Deep learning; Hash functions; Encoding; Information retrieval; Memory; Text processing; Image processing; Cross-modal retrieval; deep hashing; fine-grained similarity; semantic dictionary; transformer encoders;
D O I
10.1109/ACCESS.2024.3371173
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the efficiency of storage and retrieval speed, the hashing methods have attracted a lot of attention for cross-modal retrieval applications. In contrast to traditional cross-modal hashing by using handcrafted features, deep cross-modal hashing integrates the advantages of deep learning and hashing methods to encode raw multimodal data into compact binary codes with semantic information preserved. Generally speaking, most of the existing deep cross-modal hashing methods simply define the semantic similarity between heterogeneous modalities by counting the number of shared semantic labels (such as, two samples share at least one label, they are similar, otherwise they are dissimilar), which fails to represent the accurate multi-label semantic relations between heterogeneous data. In this paper, we propose a new Deep Self-supervised Hashing with Fine-grained Similarity Mining (DSH-FSM) framework to efficiently preserve the fine-grained multi-label semantic similarity, learning a highly separable embedding space. Specifically, by employing an asymmetric guidance strategy, a novel Semantic-Network is introduced into cross-modal hashing to learn two semantic dictionaries, including the semantic feature dictionary and the semantic code dictionary, which guides the Image-Network and the Text-Network to capture multi-label semantic relevance across different modalities. Based on the obtained semantic dictionary, an asymmetric margin-scalable loss is proposed to obtain fine-grained pair-wise similarity information, which could contribute to the production of similarity-preserving and discriminative binary codes. Besides, two feature extractors with transformer encoders are designed to achieve the Image-Network and Text-Network, which could extract the representative semantic characteristics from raw heterogeneous samples. Extensive experimental results on various benchmark datasets show that our proposed DSH-FSM framework achieves state-of-the-art cross-modal similarity search performance. Compared to the state-of-the-art methods, the results of mAP are significantly improved by 1.9%, 9.1%, and 9.8%, respectively, on the three widely used datasets.
引用
收藏
页码:31756 / 31770
页数:15
相关论文
共 50 条
  • [1] Deep cross-modal hashing with fine-grained similarity
    Yangdong Chen
    Jiaqi Quan
    Yuejie Zhang
    Rui Feng
    Tao Zhang
    [J]. Applied Intelligence, 2023, 53 : 28954 - 28973
  • [2] Deep cross-modal hashing with fine-grained similarity
    Chen, Yangdong
    Quan, Jiaqi
    Zhang, Yuejie
    Feng, Rui
    Zhang, Tao
    [J]. APPLIED INTELLIGENCE, 2023, 53 (23) : 28954 - 28973
  • [3] Fine-grained similarity semantic preserving deep hashing for cross-modal retrieval
    Li, Guoyou
    Peng, Qingjun
    Zou, Dexu
    Yang, Jinyue
    Shu, Zhenqiu
    [J]. FRONTIERS IN PHYSICS, 2023, 11
  • [4] Self-supervised incomplete cross-modal hashing retrieval
    Peng, Shouyong
    Yao, Tao
    Li, Ying
    Wang, Gang
    Wang, Lili
    Yan, Zhiming
    [J]. Expert Systems with Applications, 2025, 262
  • [5] Deep Multiscale Fine-Grained Hashing for Remote Sensing Cross-Modal Retrieval
    Huang, Jiaxiang
    Feng, Yong
    Zhou, Mingliang
    Xiong, Xiancai
    Wang, Yongheng
    Qiang, Baohua
    [J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21 : 1 - 5
  • [6] Deep supervised fused similarity hashing for cross-modal retrieval
    Ng, Wing W. Y.
    Xu, Yongzhi
    Tian, Xing
    Wang, Hui
    [J]. Multimedia Tools and Applications, 2024, 83 (39) : 86537 - 86555
  • [7] Self-supervised deep semantics-preserving Hashing for cross-modal retrieval
    Lu B.
    Duan X.
    Yuan Y.
    [J]. Qinghua Daxue Xuebao/Journal of Tsinghua University, 2022, 62 (09): : 1442 - 1449
  • [8] Self-Supervised Adversarial Hashing Networks for Cross-Modal Retrieval
    Li, Chao
    Deng, Cheng
    Li, Ning
    Liu, Wei
    Gao, Xinbo
    Tao, Dacheng
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 4242 - 4251
  • [9] Deep attentional fine-grained similarity network with adversarial learning for cross-modal retrieval
    Qingrong Cheng
    Xiaodong Gu
    [J]. Multimedia Tools and Applications, 2020, 79 : 31401 - 31428
  • [10] Autoencoder-based self-supervised hashing for cross-modal retrieval
    Li, Yifan
    Wang, Xuan
    Cui, Lei
    Zhang, Jiajia
    Huang, Chengkai
    Luo, Xuan
    Qi, Shuhan
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (11) : 17257 - 17274