CLIP-based fusion-modal reconstructing hashing for large-scale unsupervised cross-modal retrieval

被引:0
|
作者
Li Mingyong
Li Yewen
Ge Mingyuan
Ma Longfei
机构
[1] Chongqing Normal University,School of Computer and Information Science
来源
International Journal of Multimedia Information Retrieval | 2023年 / 12卷
关键词
Unsupervised cross-modal retrieval; Deep hashing; Autoencoder; Contrastive language-image pre-training;
D O I
暂无
中图分类号
学科分类号
摘要
As multi-modal data proliferates, people are no longer content with a single mode of data retrieval for access to information. Deep hashing retrieval algorithms have attracted much attention for their advantages of efficient storage and fast query speed. Currently, the existing unsupervised hashing methods generally have two limitations: (1) Existing methods fail to adequately capture the latent semantic relevance and coexistent information from the different modality data, resulting in the lack of effective feature and hash encoding representation to bridge the heterogeneous and semantic gaps in multi-modal data. (2) Existing unsupervised methods typically construct a similarity matrix to guide the hash code learning, which suffers from inaccurate similarity problems, resulting in sub-optimal retrieval performance. To address these issues, we propose a novel CLIP-based fusion-modal reconstructing hashing for Large-scale Unsupervised Cross-modal Retrieval. First, we use CLIP to encode cross-modal features of visual modalities, and learn the common representation space of the hash code using modality-specific autoencoders. Second, we propose an efficient fusion approach to construct a semantically complementary affinity matrix that can maximize the potential semantic relevance of different modal instances. Furthermore, to retain the intrinsic semantic similarity of all similar pairs in the learned hash codes, an objective function for similarity reconstruction based on semantic complementation is designed to learn high-quality hash code representations. Sufficient experiments were carried out on four multi-modal benchmark datasets (WIKI, MIRFLICKR, NUS-WIDE, and MS COCO), and the proposed method achieves state-of-the-art image-text retrieval performance compared to several representative unsupervised cross-modal hashing methods.
引用
收藏
相关论文
共 50 条
  • [31] Joint-Modal Graph Convolutional Hashing for unsupervised cross-modal retrieval
    Meng, Hui
    Zhang, Huaxiang
    Liu, Li
    Liu, Dongmei
    Lu, Xu
    Guo, Xinru
    NEUROCOMPUTING, 2024, 595
  • [32] Semantic-guided autoencoder adversarial hashing for large-scale cross-modal retrieval
    Mingyong Li
    Qiqi Li
    Yan Ma
    Degang Yang
    Complex & Intelligent Systems, 2022, 8 : 1603 - 1617
  • [33] NSDH: A Nonlinear Supervised Discrete Hashing framework for large-scale cross-modal retrieval
    Yang, Zhan
    Yang, Liu
    Raymond, Osolo Ian
    Zhu, Lei
    Huang, Wenti
    Liao, Zhifang
    Long, Jun
    KNOWLEDGE-BASED SYSTEMS, 2021, 217
  • [34] Efficient parameter-free adaptive hashing for large-scale cross-modal retrieval
    Li, Bo
    Wu, You
    Li, Zhixin
    INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2025, 180
  • [35] Semantic-guided autoencoder adversarial hashing for large-scale cross-modal retrieval
    Li, Mingyong
    Li, Qiqi
    Ma, Yan
    Yang, Degang
    COMPLEX & INTELLIGENT SYSTEMS, 2022, 8 (02) : 1603 - 1617
  • [36] Joint feature fusion hashing for cross-modal retrieval
    Cao, Yuxia
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024, 15 (12) : 6149 - 6162
  • [37] Discrete Fusion Adversarial Hashing for cross-modal retrieval
    Li, Jing
    Yu, En
    Ma, Jianhua
    Chang, Xiaojun
    Zhang, Huaxiang
    Sun, Jiande
    KNOWLEDGE-BASED SYSTEMS, 2022, 253
  • [38] Deep Multiscale Fusion Hashing for Cross-Modal Retrieval
    Nie, Xiushan
    Wang, Bowei
    Li, Jiajia
    Hao, Fanchang
    Jian, Muwei
    Yin, Yilong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (01) : 401 - 410
  • [39] Deep Unsupervised Momentum Contrastive Hashing for Cross-modal Retrieval
    Lu, Kangkang
    Yu, Yanhua
    Liang, Meiyu
    Zhang, Min
    Cao, Xiaowen
    Zhao, Zehua
    Yin, Mengran
    Xue, Zhe
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 126 - 131
  • [40] UNSUPERVISED CONTRASTIVE HASHING FOR CROSS-MODAL RETRIEVAL IN REMOTE SENSING
    Mikriukov, Georgii
    Ravanbakhsh, Mahdyar
    Demir, Begum
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4463 - 4467