CLIP-based fusion-modal reconstructing hashing for large-scale unsupervised cross-modal retrieval

被引:0
|
作者
Li Mingyong
Li Yewen
Ge Mingyuan
Ma Longfei
机构
[1] Chongqing Normal University,School of Computer and Information Science
来源
International Journal of Multimedia Information Retrieval | 2023年 / 12卷
关键词
Unsupervised cross-modal retrieval; Deep hashing; Autoencoder; Contrastive language-image pre-training;
D O I
暂无
中图分类号
学科分类号
摘要
As multi-modal data proliferates, people are no longer content with a single mode of data retrieval for access to information. Deep hashing retrieval algorithms have attracted much attention for their advantages of efficient storage and fast query speed. Currently, the existing unsupervised hashing methods generally have two limitations: (1) Existing methods fail to adequately capture the latent semantic relevance and coexistent information from the different modality data, resulting in the lack of effective feature and hash encoding representation to bridge the heterogeneous and semantic gaps in multi-modal data. (2) Existing unsupervised methods typically construct a similarity matrix to guide the hash code learning, which suffers from inaccurate similarity problems, resulting in sub-optimal retrieval performance. To address these issues, we propose a novel CLIP-based fusion-modal reconstructing hashing for Large-scale Unsupervised Cross-modal Retrieval. First, we use CLIP to encode cross-modal features of visual modalities, and learn the common representation space of the hash code using modality-specific autoencoders. Second, we propose an efficient fusion approach to construct a semantically complementary affinity matrix that can maximize the potential semantic relevance of different modal instances. Furthermore, to retain the intrinsic semantic similarity of all similar pairs in the learned hash codes, an objective function for similarity reconstruction based on semantic complementation is designed to learn high-quality hash code representations. Sufficient experiments were carried out on four multi-modal benchmark datasets (WIKI, MIRFLICKR, NUS-WIDE, and MS COCO), and the proposed method achieves state-of-the-art image-text retrieval performance compared to several representative unsupervised cross-modal hashing methods.
引用
收藏
相关论文
共 50 条
  • [21] Unsupervised Deep Hashing via Binary Latent Factor Models for Large-scale Cross-modal Retrieval
    Wu, Gengshen
    Lin, Zijia
    Han, Jungong
    Liu, Li
    Ding, Guiguang
    Zhang, Baochang
    Shen, Jialie
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 2854 - 2860
  • [22] Semantics-Reconstructing Hashing for Cross-Modal Retrieval
    Zhang, Peng-Fei
    Huang, Zi
    Zhang, Zheng
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2020, PT II, 2020, 12085 : 315 - 327
  • [23] FDDH: Fast Discriminative Discrete Hashing for Large-Scale Cross-Modal Retrieval
    Liu, Xin
    Wang, Xingzhi
    Yiu-ming Cheung
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (11) : 6306 - 6320
  • [24] Joint and individual matrix factorization hashing for large-scale cross-modal retrieval
    Wang, Di
    Wang, Quan
    He, Lihuo
    Gao, Xinbo
    Tian, Yumin
    PATTERN RECOGNITION, 2020, 107
  • [25] Semantic-consistent cross-modal hashing for large-scale image retrieval
    Gu, Xuesong
    Dong, Guohua
    Zhang, Xiang
    Lan, Long
    Luo, Zhigang
    NEUROCOMPUTING, 2021, 433 : 181 - 198
  • [26] Cross-Modal Self-Taught Hashing for large-scale image retrieval
    Xie, Liang
    Zhu, Lei
    Pan, Peng
    Lu, Yansheng
    SIGNAL PROCESSING, 2016, 124 : 81 - 92
  • [27] Large-Scale Supervised Hashing for Cross-Modal Retreival
    Karbil, Loubna
    Daoudi, Imane
    2017 IEEE/ACS 14TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2017, : 803 - 808
  • [28] Robust Unsupervised Cross-modal Hashing for Multimedia Retrieval
    Cheng, Miaomiao
    Jing, Liping
    Ng, Michael K.
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2020, 38 (03)
  • [29] CLIP4Hashing: Unsupervised Deep Hashing for Cross-Modal Video-Text Retrieval
    Zhuo, Yaoxin
    Li, Yikang
    Hsiao, Jenhao
    Ho, Chiuman
    Li, Baoxin
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2022, 2022, : 158 - 166
  • [30] Enhanced-Similarity Attention Fusion for Unsupervised Cross-Modal Hashing Retrieval
    Li, Mingyong
    Ge, Mingyuan
    DATA SCIENCE AND ENGINEERING, 2025,