CLIP-based fusion-modal reconstructing hashing for large-scale unsupervised cross-modal retrieval

被引:0
|
作者
Li Mingyong
Li Yewen
Ge Mingyuan
Ma Longfei
机构
[1] Chongqing Normal University,School of Computer and Information Science
关键词
Unsupervised cross-modal retrieval; Deep hashing; Autoencoder; Contrastive language-image pre-training;
D O I
暂无
中图分类号
学科分类号
摘要
As multi-modal data proliferates, people are no longer content with a single mode of data retrieval for access to information. Deep hashing retrieval algorithms have attracted much attention for their advantages of efficient storage and fast query speed. Currently, the existing unsupervised hashing methods generally have two limitations: (1) Existing methods fail to adequately capture the latent semantic relevance and coexistent information from the different modality data, resulting in the lack of effective feature and hash encoding representation to bridge the heterogeneous and semantic gaps in multi-modal data. (2) Existing unsupervised methods typically construct a similarity matrix to guide the hash code learning, which suffers from inaccurate similarity problems, resulting in sub-optimal retrieval performance. To address these issues, we propose a novel CLIP-based fusion-modal reconstructing hashing for Large-scale Unsupervised Cross-modal Retrieval. First, we use CLIP to encode cross-modal features of visual modalities, and learn the common representation space of the hash code using modality-specific autoencoders. Second, we propose an efficient fusion approach to construct a semantically complementary affinity matrix that can maximize the potential semantic relevance of different modal instances. Furthermore, to retain the intrinsic semantic similarity of all similar pairs in the learned hash codes, an objective function for similarity reconstruction based on semantic complementation is designed to learn high-quality hash code representations. Sufficient experiments were carried out on four multi-modal benchmark datasets (WIKI, MIRFLICKR, NUS-WIDE, and MS COCO), and the proposed method achieves state-of-the-art image-text retrieval performance compared to several representative unsupervised cross-modal hashing methods.
引用
收藏
相关论文
共 50 条
  • [1] CLIP-based fusion-modal reconstructing hashing for large-scale unsupervised cross-modal retrieval
    Mingyong, Li
    Yewen, Li
    Mingyuan, Ge
    Longfei, Ma
    [J]. INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2023, 12 (01)
  • [2] Unsupervised Deep Cross-Modal Hashing by Knowledge Distillation for Large-scale Cross-modal Retrieval
    Li, Mingyong
    Wang, Hongya
    [J]. PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, : 183 - 191
  • [3] CKDH: CLIP-Based Knowledge Distillation Hashing for Cross-Modal Retrieval
    Li, Jiaxing
    Wong, Wai Keung
    Jiang, Lin
    Fang, Xiaozhao
    Xie, Shengli
    Xu, Yong
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (07) : 6530 - 6541
  • [4] Deep Joint-Semantics Reconstructing Hashing for Large-Scale Unsupervised Cross-Modal Retrieval
    Su, Shupeng
    Zhong, Zhisheng
    Zhang, Chao
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 3027 - 3035
  • [5] CLIP-Based Adaptive Graph Attention Network for Large-Scale Unsupervised Multi-Modal Hashing Retrieval
    Li, Yewen
    Ge, Mingyuan
    Li, Mingyong
    Li, Tiansong
    Xiang, Sen
    [J]. SENSORS, 2023, 23 (07)
  • [6] Joint-modal Distribution-based Similarity Hashing for Large-scale Unsupervised Deep Cross-modal Retrieval
    Liu, Song
    Qian, Shengsheng
    Guan, Yang
    Zhan, Jiawei
    Ying, Long
    [J]. PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 1379 - 1388
  • [7] Unsupervised multi-graph cross-modal hashing for large-scale multimedia retrieval
    Xie, Liang
    Zhu, Lei
    Chen, Guoqi
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2016, 75 (15) : 9185 - 9204
  • [8] Unsupervised multi-graph cross-modal hashing for large-scale multimedia retrieval
    Liang Xie
    Lei Zhu
    Guoqi Chen
    [J]. Multimedia Tools and Applications, 2016, 75 : 9185 - 9204
  • [9] Self-Attentive CLIP Hashing for Unsupervised Cross-Modal Retrieval
    Yu, Heng
    Ding, Shuyan
    Li, Lunbo
    Wu, Jiexin
    [J]. PROCEEDINGS OF THE 4TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA IN ASIA, MMASIA 2022, 2022,
  • [10] Unsupervised Multi-modal Hashing for Cross-Modal Retrieval
    Yu, Jun
    Wu, Xiao-Jun
    Zhang, Donglin
    [J]. COGNITIVE COMPUTATION, 2022, 14 (03) : 1159 - 1171