Deep Visual-Semantic Hashing for Cross-Modal Retrieval

被引:184
|
作者
Cao, Yue [1 ]
Long, Mingsheng [1 ]
Wang, Jianmin [1 ]
Yang, Qiang [3 ]
Yu, Philip S. [1 ,2 ]
机构
[1] Tsinghua Univ, Sch Software, Tsinghua Natl Lab TNList, Beijing, Peoples R China
[2] Univ Illinois, Chicago, IL USA
[3] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
Deep hashing; cross-modal retrieval; multimodal embedding;
D O I
10.1145/2939672.2939812
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Due to the storage and retrieval efficiency, hashing has been widely applied to approximate nearest neighbor search for large-scale multimedia retrieval. Cross-modal hashing, which enables efficient retrieval of images in response to text queries or vice versa, has received increasing attention recently. Most existing work on cross-modal hashing does not capture the spatial dependency of images and temporal dynamics of text sentences for learning powerful feature representations and cross-modal embeddings that mitigate the heterogeneity of different modalities. This paper presents a new Deep Visual-Semantic Hashing (DVSH) model that generates compact hash codes of images and sentences in an end-to-end deep learning architecture, which capture the intrinsic cross-modal correspondences between visual data and natural language. DVSH is a hybrid deep architecture that constitutes a visual semantic fusion network for learning joint embedding space of images and text sentences, and two modality-specific hashing networks for learning hash functions to generate compact binary codes. Our architecture effectively unifies joint multi-modal embedding and cross-modal hashing, which is based on a novel combination of Convolutional Neural Networks over images, Recurrent Neural Networks over sentences, and a structured max-margin objective that integrates all things together to enable learning of similarity-preserving and high quality hash codes. Extensive empirical evidence shows that our DVSH approach yields state of the art results in cross modal retrieval experiments on image-sentences datasets, i.e. standard IAPR TC-12 and large-scale Microsoft COCO.
引用
收藏
页码:1445 / 1454
页数:10
相关论文
共 50 条
  • [21] Multi-task hierarchical convolutional network for visual-semantic cross-modal retrieval
    Ji, Zhong
    Lin, Zhigang
    Wang, Haoran
    Pang, Yanwei
    Li, Xuelong
    [J]. PATTERN RECOGNITION, 2024, 151
  • [22] Supervised Hierarchical Deep Hashing for Cross-Modal Retrieval
    Zhan, Yu-Wei
    Luo, Xin
    Wang, Yongxin
    Xu, Xin-Shun
    [J]. MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 3386 - 3394
  • [23] Deep Hashing Similarity Learning for Cross-Modal Retrieval
    Ma, Ying
    Wang, Meng
    Lu, Guangyun
    Sun, Yajun
    [J]. IEEE ACCESS, 2024, 12 : 8609 - 8618
  • [24] Deep Multiscale Fusion Hashing for Cross-Modal Retrieval
    Nie, Xiushan
    Wang, Bowei
    Li, Jiajia
    Hao, Fanchang
    Jian, Muwei
    Yin, Yilong
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (01) : 401 - 410
  • [25] Joint Semantic Preserving Sparse Hashing for Cross-Modal Retrieval
    Hu, Zhikai
    Cheung, Yiu-Ming
    Li, Mengke
    Lan, Weichao
    Zhang, Donglin
    Liu, Qiang
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (04) : 2989 - 3002
  • [26] Semantic Boosting Cross-Modal Hashing for efficient multimedia retrieval
    Wang, Ke
    Tang, Jun
    Wang, Nian
    Shao, Ling
    [J]. INFORMATION SCIENCES, 2016, 330 : 199 - 210
  • [27] Semantic preserving asymmetric discrete hashing for cross-modal retrieval
    Fan Yang
    Qiao-xi Zhang
    Xiao-jian Ding
    Fu-min Ma
    Jie Cao
    De-yu Tong
    [J]. Applied Intelligence, 2023, 53 : 15352 - 15371
  • [28] Adaptive Marginalized Semantic Hashing for Unpaired Cross-Modal Retrieval
    Luo, Kaiyi
    Zhang, Chao
    Li, Huaxiong
    Jia, Xiuyi
    Chen, Chunlin
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 9082 - 9095
  • [29] Label-wise Deep Semantic-Alignment Hashing for Cross-Modal Retrieval
    Li, Liang
    Sun, Weiwei
    [J]. PROCEEDINGS OF THE 2023 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2023, 2023, : 416 - 424
  • [30] Discrete semantic embedding hashing for scalable cross-modal retrieval
    Liu, Junjie
    Fei, Lunke
    Jia, Wei
    Zhao, Shuping
    Wen, Jie
    Teng, Shaohua
    Zhang, Wei
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2021, : 1461 - 1467