A triple fusion model for cross-modal deep hashing retrieval

被引:1
|
作者
Wang, Hufei [1 ]
Zhao, Kaiqiang [1 ]
Zhao, Dexin [1 ]
机构
[1] Tianjin Univ Technol, Tianjin Key Lab Intelligence Comp & Novel Softwar, 391 West Binshui Rd, Tianjin 300384, Peoples R China
关键词
Hashing learning; Cross-modal retrieval; Semantic similarity; Shared semantics;
D O I
10.1007/s00530-022-01005-6
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the field of resource retrieval, deep cross-modal retrieval is attracting increasing attention. It has a lower storage capacity and faster retrieval speed. However, most of the current methods put their attention on the semantic similarity between hash codes. They ignore the similarity between features extracted by the model from different modalities, which leads them to achieve suboptimal results. In addition, the correlation between different modalities is difficult to exploit adequately. Therefore, in order to enhance the information correlation between different modalities, a triple fusion model for cross-modal deep hashing retrieval (SSTFH) is proposed in this paper. To weaken the missing feature information when features pass through the fully connected layer, we designed a triple fusion strategy. Specifically, the first fusion and the second fusion are performed for images and text respectively, to obtain pattern-specific features. The third fusion is used to obtain more relevant semantic features. In addition, we attempt to use shared semantic information from semantic features to guide the model in extracting correlations between different modalities. Comprehensive experiments have been conducted on the benchmark IAPR TC-12 and MS COCO datasets. On MS COCO, our approach outperforms all the deep baselines by an average of 7.74% on the image-to-text task, and by 8.72% on the text-to-image task. On IAPR TC-12, our approach averagely improves image retrieval by 7.07% and text retrieval by 4.88%.
引用
收藏
页码:347 / 359
页数:13
相关论文
共 50 条
  • [1] A triple fusion model for cross-modal deep hashing retrieval
    Hufei Wang
    Kaiqiang Zhao
    Dexin Zhao
    [J]. Multimedia Systems, 2023, 29 : 347 - 359
  • [2] Deep Multiscale Fusion Hashing for Cross-Modal Retrieval
    Nie, Xiushan
    Wang, Bowei
    Li, Jiajia
    Hao, Fanchang
    Jian, Muwei
    Yin, Yilong
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (01) : 401 - 410
  • [3] Deep Label Feature Fusion Hashing for Cross-Modal Retrieval
    Ren, Dongxiao
    Xu, Weihua
    Wang, Zhonghua
    Sun, Qinxiu
    [J]. IEEE ACCESS, 2022, 10 : 100276 - 100285
  • [4] Unsupervised Deep Fusion Cross-modal Hashing
    Huang, Jiaming
    Min, Chen
    Jing, Liping
    [J]. ICMI'19: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2019, : 358 - 366
  • [5] Joint feature fusion hashing for cross-modal retrieval
    Cao, Yuxia
    [J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024,
  • [6] Discrete Fusion Adversarial Hashing for cross-modal retrieval
    Li, Jing
    Yu, En
    Ma, Jianhua
    Chang, Xiaojun
    Zhang, Huaxiang
    Sun, Jiande
    [J]. KNOWLEDGE-BASED SYSTEMS, 2022, 253
  • [7] Supervised Hierarchical Deep Hashing for Cross-Modal Retrieval
    Zhan, Yu-Wei
    Luo, Xin
    Wang, Yongxin
    Xu, Xin-Shun
    [J]. MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 3386 - 3394
  • [8] Deep Hashing Similarity Learning for Cross-Modal Retrieval
    Ma, Ying
    Wang, Meng
    Lu, Guangyun
    Sun, Yajun
    [J]. IEEE ACCESS, 2024, 12 : 8609 - 8618
  • [9] FUSION-SUPERVISED DEEP CROSS-MODAL HASHING
    Wang, Li
    Zhu, Lei
    Yu, En
    Sun, Jiande
    Zhang, Huaxiang
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 37 - 42
  • [10] Triplet Fusion Network Hashing for Unpaired Cross-Modal Retrieval
    Hu, Zhikai
    Liu, Xin
    Wang, Xingzhi
    Cheung, Yiu-ming
    Wang, Nannan
    Chen, Yewang
    [J]. ICMR'19: PROCEEDINGS OF THE 2019 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2019, : 141 - 149