Unsupervised deep hashing with multiple similarity preservation for cross-modal image-text retrieval

被引:0
|
作者
Xiong, Siyu [1 ]
Pan, Lili [1 ]
Ma, Xueqiang [1 ]
Hu, Qinghua [2 ]
Beckman, Eric [3 ]
机构
[1] Cent South Univ Forestry & Technol, Coll Comp Sci & Informat Technol, Changsha 410004, Peoples R China
[2] Tianjin Univ, Sch Artificial Intelligence, Tianjin 300457, Peoples R China
[3] Florida Int Univ, Chaplin Sch Hospitality & Tourism Management, North Miami, FL 33181 USA
关键词
Cross-modal retrieval; Deep hashing; Unsupervised learning; Similarity preservation; Image-text retrieval;
D O I
10.1007/s13042-024-02154-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep hashing cross-modal image-text retrieval has the advantage of low storage cost and high retrieval efficiency by mapping different modal data into a Hamming space. However, the existing unsupervised deep hashing methods generally relied on the intrinsic similarity information of each modal for structural matching, failing to fully consider the heterogeneous characteristics and semantic gaps of different modalities, which results in the loss of latent semantic correlation and co-occurrence information between the different modalities. To address this problem, this paper proposes an unsupervised deep hashing with multiple similarity preservation (UMSP) method for cross-modal image-text retrieval. First, to enhance the representation ability of the deep features of each modality, a modality-specific image-text feature extraction module is designed. Specifically, the image network with parallel structure and text network are constructed with the vision-language pre-training image encoder and multi-layer perceptron to capture the deep semantic information of each modality and learn a common hash code representation space. Then, to bridge the heterogeneous gap and improve the discriminability of hash codes, a multiple similarity preservation module is builded based on three perspectives: joint modal space, cross-modal hash space and image modal space, which aids the network to preserve the semantic similarity of modalities. Experimental results on three benchmark datasets (Wikipedia, MIRFlickr-25K and NUS-WIDE) show that UMSP outperforms other unsupervised methods for cross-modal image-text retrieval.
引用
收藏
页码:4423 / 4434
页数:12
相关论文
共 50 条
  • [1] DEEP RANK CROSS-MODAL HASHING WITH SEMANTIC CONSISTENT FOR IMAGE-TEXT RETRIEVAL
    Liu, Xiaoqing
    Zeng, Huanqiang
    Shi, Yifan
    Zhu, Jianqing
    Ma, Kai-Kuang
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4828 - 4832
  • [2] Deep Hashing Similarity Learning for Cross-Modal Retrieval
    Ma, Ying
    Wang, Meng
    Lu, Guangyun
    Sun, Yajun
    [J]. IEEE ACCESS, 2024, 12 : 8609 - 8618
  • [3] Cross-modal Image-Text Retrieval with Multitask Learning
    Luo, Junyu
    Shen, Ying
    Ao, Xiang
    Zhao, Zhou
    Yang, Min
    [J]. PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19), 2019, : 2309 - 2312
  • [4] Rethinking Benchmarks for Cross-modal Image-text Retrieval
    Chen, Weijing
    Yao, Linli
    Jin, Qin
    [J]. PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 1241 - 1251
  • [5] Cross-Modal Image-Text Retrieval with Semantic Consistency
    Chen, Hui
    Ding, Guiguang
    Lin, Zijin
    Zhao, Sicheng
    Han, Jungong
    [J]. PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1749 - 1757
  • [6] Hashing for Cross-Modal Similarity Retrieval
    Liu, Yao
    Yuan, Yanhong
    Huang, Qiaoli
    Huang, Zhixing
    [J]. 2015 11TH INTERNATIONAL CONFERENCE ON SEMANTICS, KNOWLEDGE AND GRIDS (SKG), 2015, : 1 - 8
  • [7] CLIP4Hashing: Unsupervised Deep Hashing for Cross-Modal Video-Text Retrieval
    Zhuo, Yaoxin
    Li, Yikang
    Hsiao, Jenhao
    Ho, Chiuman
    Li, Baoxin
    [J]. PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2022, 2022, : 158 - 166
  • [8] Deep Unsupervised Momentum Contrastive Hashing for Cross-modal Retrieval
    Lu, Kangkang
    Yu, Yanhua
    Liang, Meiyu
    Zhang, Min
    Cao, Xiaowen
    Zhao, Zehua
    Yin, Mengran
    Xue, Zhe
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 126 - 131
  • [9] Unsupervised Deep Imputed Hashing for Partial Cross-modal Retrieval
    Chen, Dong
    Cheng, Miaomiao
    Min, Chen
    Jing, Liping
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [10] Deep semantic similarity adversarial hashing for cross-modal retrieval
    Qiang, Haopeng
    Wan, Yuan
    Xiang, Lun
    Meng, Xiaojing
    [J]. NEUROCOMPUTING, 2020, 400 : 24 - 33