CLIP-based fusion-modal reconstructing hashing for large-scale unsupervised cross-modal retrieval

被引:0
|
作者
Li Mingyong
Li Yewen
Ge Mingyuan
Ma Longfei
机构
[1] Chongqing Normal University,School of Computer and Information Science
来源
International Journal of Multimedia Information Retrieval | 2023年 / 12卷
关键词
Unsupervised cross-modal retrieval; Deep hashing; Autoencoder; Contrastive language-image pre-training;
D O I
暂无
中图分类号
学科分类号
摘要
As multi-modal data proliferates, people are no longer content with a single mode of data retrieval for access to information. Deep hashing retrieval algorithms have attracted much attention for their advantages of efficient storage and fast query speed. Currently, the existing unsupervised hashing methods generally have two limitations: (1) Existing methods fail to adequately capture the latent semantic relevance and coexistent information from the different modality data, resulting in the lack of effective feature and hash encoding representation to bridge the heterogeneous and semantic gaps in multi-modal data. (2) Existing unsupervised methods typically construct a similarity matrix to guide the hash code learning, which suffers from inaccurate similarity problems, resulting in sub-optimal retrieval performance. To address these issues, we propose a novel CLIP-based fusion-modal reconstructing hashing for Large-scale Unsupervised Cross-modal Retrieval. First, we use CLIP to encode cross-modal features of visual modalities, and learn the common representation space of the hash code using modality-specific autoencoders. Second, we propose an efficient fusion approach to construct a semantically complementary affinity matrix that can maximize the potential semantic relevance of different modal instances. Furthermore, to retain the intrinsic semantic similarity of all similar pairs in the learned hash codes, an objective function for similarity reconstruction based on semantic complementation is designed to learn high-quality hash code representations. Sufficient experiments were carried out on four multi-modal benchmark datasets (WIKI, MIRFLICKR, NUS-WIDE, and MS COCO), and the proposed method achieves state-of-the-art image-text retrieval performance compared to several representative unsupervised cross-modal hashing methods.
引用
收藏
相关论文
共 50 条
  • [41] Unsupervised Deep Imputed Hashing for Partial Cross-modal Retrieval
    Chen, Dong
    Cheng, Miaomiao
    Min, Chen
    Jing, Liping
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [42] RETRACTED: Deep Unsupervised Hashing for Large-Scale Cross-Modal Retrieval Using Knowledge Distillation Model (Retracted Article)
    Li, Mingyong
    Li, Qiqi
    Tang, Lirong
    Peng, Shuang
    Ma, Yan
    Yang, Degang
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2021, 2021
  • [43] Coupled CycleGAN: Unsupervised Hashing Network for Cross-Modal Retrieval
    Li, Chao
    Deng, Cheng
    Wang, Lei
    Xie, De
    Liu, Xianglong
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 176 - 183
  • [44] Revising similarity relationship hashing for unsupervised cross-modal retrieval
    Wu, You
    Li, Bo
    Li, Zhixin
    NEUROCOMPUTING, 2025, 614
  • [45] Unsupervised Contrastive Cross-Modal Hashing
    Hu, Peng
    Zhu, Hongyuan
    Lin, Jie
    Peng, Dezhong
    Zhao, Yin-Ping
    Peng, Xi
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (03) : 3877 - 3889
  • [46] Completely Unsupervised Cross-Modal Hashing
    Duan, Jiasheng
    Zhang, Pengfei
    Huang, Zi
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2020), PT I, 2020, 12112 : 178 - 194
  • [47] SCQ: Self-Supervised Cross-Modal Quantization for Unsupervised Large-Scale Retrieval
    Nakamura, Fuga
    Harakawa, Ryosuke
    Iwahashi, Masahiro
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1337 - 1342
  • [48] Aggregation-Based Graph Convolutional Hashing for Unsupervised Cross-Modal Retrieval
    Zhang, Peng-Fei
    Li, Yang
    Huang, Zi
    Xu, Xin-Shun
    IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 466 - 479
  • [49] SEMI-SUPERVISED GRAPH CONVOLUTIONAL HASHING NETWORK FOR LARGE-SCALE CROSS-MODAL RETRIEVAL
    Shen, Zhanjian
    Zhai, Deming
    Liu, Xianming
    Jiang, Junjun
    2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 2366 - 2370
  • [50] Semantics-preserving hashing based on multi-scale fusion for cross-modal retrieval
    Zhang, Hong
    Pan, Min
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (11) : 17299 - 17314