Deep semantic hashing with dual attention for cross-modal retrieval

被引:5
|
作者
Wu, Jiagao [1 ,2 ]
Weng, Weiwei [1 ,2 ]
Fu, Junxia [1 ,2 ]
Liu, Linfeng [1 ,2 ]
Hu, Bin [3 ]
机构
[1] Nanjing Univ Posts & Telecommun, Sch Comp, Nanjing 210023, Jiangsu, Peoples R China
[2] Jiangsu Key Lab Big Data Secur & Intelligent Proc, Nanjing 210023, Jiangsu, Peoples R China
[3] Nanjing Normal Univ, Key Lab Virtual Geog Environm, Minist Educ, Nanjing 210046, Jiangsu, Peoples R China
来源
NEURAL COMPUTING & APPLICATIONS | 2022年 / 34卷 / 07期
基金
中国国家自然科学基金;
关键词
Cross-modal retrieval; Deep hashing; Semantic label network; Attention mechanism; CODES;
D O I
10.1007/s00521-021-06696-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the explosive growth of multimodal data, cross-modal retrieval has drawn increasing research interests. Hashing-based methods have made great advancements in cross-modal retrieval due to the benefits of low storage cost and fast query speed. However, there still exists a crucial challenge to improve the accuracy of cross-modal retrieval due to the heterogeneity gap between modalities. To further tackle this problem, in this paper, we propose a new two-staged cross-modal retrieval method, called Deep Semantic Hashing with Dual Attention (DSHDA). In the first stage of DSHDA, a Semantic Label Network (SeLabNet) is designed to extract label semantic features and hash codes by training the multi-label annotations, which can make the learning of different modalities in a common semantic space and bridge the modality gap effectively. In the second stage of DSHDA, we propose a deep neural network to simultaneously integrate feature and hash code learning for each modality into the same framework, the training of the framework is guided by the label semantic features and hash codes generated from SeLabNet to maximize the cross-modal semantic relevance. Moreover, dual attention mechanisms are used in our neural networks: (1) Lo-attention is used to extract the local key information of each modality and improve the quality of modality features. (2) Co-attention is used to strengthen the relationship between different modalities to produce more consistent and accurate hash codes. Extensive experiments on two real datasets with image-text modalities demonstrate the superiority of the proposed method in cross-modal retrieval tasks.
引用
收藏
页码:5397 / 5416
页数:20
相关论文
共 50 条
  • [1] Deep semantic hashing with dual attention for cross-modal retrieval
    Jiagao Wu
    Weiwei Weng
    Junxia Fu
    Linfeng Liu
    Bin Hu
    [J]. Neural Computing and Applications, 2022, 34 : 5397 - 5416
  • [2] Multi-attention based semantic deep hashing for cross-modal retrieval
    Zhu, Liping
    Tian, Gangyi
    Wang, Bingyao
    Wang, Wenjie
    Zhang, Di
    Li, Chengyang
    [J]. APPLIED INTELLIGENCE, 2021, 51 (08) : 5927 - 5939
  • [3] Multi-attention based semantic deep hashing for cross-modal retrieval
    Liping Zhu
    Gangyi Tian
    Bingyao Wang
    Wenjie Wang
    Di Zhang
    Chengyang Li
    [J]. Applied Intelligence, 2021, 51 : 5927 - 5939
  • [4] An efficient dual semantic preserving hashing for cross-modal retrieval
    Liu, Yun
    Ji, Shujuan
    Fu, Qiang
    Chiu, Dickson K. W.
    Gong, Maoguo
    [J]. NEUROCOMPUTING, 2022, 492 : 264 - 277
  • [5] Deep semantic similarity adversarial hashing for cross-modal retrieval
    Qiang, Haopeng
    Wan, Yuan
    Xiang, Lun
    Meng, Xiaojing
    [J]. NEUROCOMPUTING, 2020, 400 : 24 - 33
  • [6] Deep Visual-Semantic Hashing for Cross-Modal Retrieval
    Cao, Yue
    Long, Mingsheng
    Wang, Jianmin
    Yang, Qiang
    Yu, Philip S.
    [J]. KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 1445 - 1454
  • [7] A novel deep translated attention hashing for cross-modal retrieval
    Haibo Yu
    Ran Ma
    Min Su
    Ping An
    Kai Li
    [J]. Multimedia Tools and Applications, 2022, 81 : 26443 - 26461
  • [8] A novel deep translated attention hashing for cross-modal retrieval
    Yu, Haibo
    Ma, Ran
    Su, Min
    An, Ping
    Li, Kai
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (18) : 26443 - 26461
  • [9] Semantic deep cross-modal hashing
    Lin, Qiubin
    Cao, Wenming
    He, Zhihai
    He, Zhiquan
    [J]. NEUROCOMPUTING, 2020, 396 (396) : 113 - 122
  • [10] Semantic consistency hashing for cross-modal retrieval
    Yao, Tao
    Kong, Xiangwei
    Fu, Haiyan
    Tian, Qi
    [J]. NEUROCOMPUTING, 2016, 193 : 250 - 259