Deep Visual-Semantic Hashing for Cross-Modal Retrieval

被引:184
|
作者
Cao, Yue [1 ]
Long, Mingsheng [1 ]
Wang, Jianmin [1 ]
Yang, Qiang [3 ]
Yu, Philip S. [1 ,2 ]
机构
[1] Tsinghua Univ, Sch Software, Tsinghua Natl Lab TNList, Beijing, Peoples R China
[2] Univ Illinois, Chicago, IL USA
[3] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
Deep hashing; cross-modal retrieval; multimodal embedding;
D O I
10.1145/2939672.2939812
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Due to the storage and retrieval efficiency, hashing has been widely applied to approximate nearest neighbor search for large-scale multimedia retrieval. Cross-modal hashing, which enables efficient retrieval of images in response to text queries or vice versa, has received increasing attention recently. Most existing work on cross-modal hashing does not capture the spatial dependency of images and temporal dynamics of text sentences for learning powerful feature representations and cross-modal embeddings that mitigate the heterogeneity of different modalities. This paper presents a new Deep Visual-Semantic Hashing (DVSH) model that generates compact hash codes of images and sentences in an end-to-end deep learning architecture, which capture the intrinsic cross-modal correspondences between visual data and natural language. DVSH is a hybrid deep architecture that constitutes a visual semantic fusion network for learning joint embedding space of images and text sentences, and two modality-specific hashing networks for learning hash functions to generate compact binary codes. Our architecture effectively unifies joint multi-modal embedding and cross-modal hashing, which is based on a novel combination of Convolutional Neural Networks over images, Recurrent Neural Networks over sentences, and a structured max-margin objective that integrates all things together to enable learning of similarity-preserving and high quality hash codes. Extensive empirical evidence shows that our DVSH approach yields state of the art results in cross modal retrieval experiments on image-sentences datasets, i.e. standard IAPR TC-12 and large-scale Microsoft COCO.
引用
收藏
页码:1445 / 1454
页数:10
相关论文
共 50 条
  • [1] Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval
    Song, Yale
    Soleymani, Mohammad
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 1979 - 1988
  • [2] Deep semantic hashing with dual attention for cross-modal retrieval
    Jiagao Wu
    Weiwei Weng
    Junxia Fu
    Linfeng Liu
    Bin Hu
    [J]. Neural Computing and Applications, 2022, 34 : 5397 - 5416
  • [3] Deep semantic similarity adversarial hashing for cross-modal retrieval
    Qiang, Haopeng
    Wan, Yuan
    Xiang, Lun
    Meng, Xiaojing
    [J]. NEUROCOMPUTING, 2020, 400 : 24 - 33
  • [4] Deep semantic hashing with dual attention for cross-modal retrieval
    Wu, Jiagao
    Weng, Weiwei
    Fu, Junxia
    Liu, Linfeng
    Hu, Bin
    [J]. NEURAL COMPUTING & APPLICATIONS, 2022, 34 (07): : 5397 - 5416
  • [5] Cross-modal Retrieval Using Contrastive Learning of Visual-Semantic Embeddings
    Jain, Anurag
    Verma, Yashaswi
    [J]. 2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 4693 - 4699
  • [6] Semantic deep cross-modal hashing
    Lin, Qiubin
    Cao, Wenming
    He, Zhihai
    He, Zhiquan
    [J]. NEUROCOMPUTING, 2020, 396 : 113 - 122
  • [7] Semantic consistency hashing for cross-modal retrieval
    Yao, Tao
    Kong, Xiangwei
    Fu, Haiyan
    Tian, Qi
    [J]. NEUROCOMPUTING, 2016, 193 : 250 - 259
  • [8] Deep Multi-Level Semantic Hashing for Cross-Modal Retrieval
    Ji, Zhenyan
    Yao, Weina
    Wei, Wei
    Song, Houbing
    Pi, Huaiyu
    [J]. IEEE ACCESS, 2019, 7 : 23667 - 23674
  • [9] Label-Based Deep Semantic Hashing for Cross-Modal Retrieval
    Weng, Weiwei
    Wu, Jiagao
    Yang, Lu
    Liu, Linfeng
    Hu, Bin
    [J]. NEURAL INFORMATION PROCESSING (ICONIP 2019), PT III, 2019, 11955 : 24 - 36
  • [10] Cross-modal hashing with semantic deep embedding
    Yan, Cheng
    Bai, Xiao
    Wang, Shuai
    Zhou, Jun
    Hancock, Edwin R.
    [J]. NEUROCOMPUTING, 2019, 337 : 58 - 66