Multi-Granularity Interactive Transformer Hashing for Cross-modal Retrieval

被引:12
|
作者
Liu, Yishu [1 ]
Wu, Qingpeng [1 ]
Zhang, Zheng [1 ,2 ]
Zhang, Jingyi [3 ]
Lu, Guangming [1 ]
机构
[1] Harbin Inst Technol, Shenzhen, Peoples R China
[2] Peng Cheng Lab, Shenzhen, Peoples R China
[3] Huawei Technol Co Ltd, Hangzhou, Peoples R China
来源
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023 | 2023年
基金
中国国家自然科学基金;
关键词
Cross-modal hashing; cross-modal retrieval; multi-granularity; transformer; knowledge distillation; contrastive learning;
D O I
10.1145/3581783.3612411
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the powerful representation ability and privileged efficiency, deep cross-modal hashing (DCMH) has become an emerging fast similarity search technique. Prior studies primarily focus on exploring pairwise similarities across modalities, but fail to comprehensively capture the multi-grained semantic correlations during intra- and inter-modal negotiation. To tackle this issue, this paper proposes a novel Multi-granularity Interactive Transformer Hashing (MITH) network, which hierarchically considers both coarse- and fine-grained similarity measurements across different modalities in one unified transformer-based framework. To the best of our knowledge, this is the first attempt for multi-granularity transformer-based cross-modal hashing. Specifically, a well-designed distilled intra-modal interaction module is deployed to excavate modality-specific concept knowledge with global-local knowledge distillation under the guidance of implicit conceptual category-level representations. Moreover, we construct a contrastive inter-modal alignment module to mine modality-independent semantic concept correspondences with instance- and token-wise contrastive learning, respectively. Such a collaborative learning paradigm can jointly alleviate the heterogeneity and semantic gaps among different modalities from a multi-granularity perspective, yielding discriminative modality-invariant hash codes. Extensive experiments on multiple representative cross-modal datasets demonstrate the consistent superiority of MITH over the existing state-of-the-art baselines. The codes are available at https://github.com/DarrenZZhang/MITH.
引用
收藏
页码:893 / 902
页数:10
相关论文
共 50 条
  • [21] Kernelized Cross-Modal Hashing for Multimedia Retrieval
    Tan, Shoubiao
    Hu, Lingyu
    Wang-Xu, Anqi
    Tang, Jun
    Jia, Zhaohong
    PROCEEDINGS OF THE 2016 12TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA), 2016, : 1224 - 1228
  • [22] Multi-Level Correlation Adversarial Hashing for Cross-Modal Retrieval
    Ma, Xinhong
    Zhang, Tianzhu
    Xu, Changsheng
    IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (12) : 3101 - 3114
  • [23] Deep Multi-Level Semantic Hashing for Cross-Modal Retrieval
    Ji, Zhenyan
    Yao, Weina
    Wei, Wei
    Song, Houbing
    Pi, Huaiyu
    IEEE ACCESS, 2019, 7 : 23667 - 23674
  • [24] Graph Convolutional Multi-Label Hashing for Cross-Modal Retrieval
    Shen, Xiaobo
    Chen, Yinfan
    Liu, Weiwei
    Zheng, Yuhui
    Sun, Quan-Sen
    Pan, Shirui
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
  • [25] Transformer-Based Discriminative and Strong Representation Deep Hashing for Cross-Modal Retrieval
    Zhou, Suqing
    Han, Yu
    Chen, Ning
    Huang, Siyu
    Igorevich, Kostromitin Konstantin
    Luo, Jia
    Zhang, Peiying
    IEEE ACCESS, 2023, 11 : 140041 - 140055
  • [26] Multi-granularity cross-modal representation learning for named entity recognition on social media
    Liu, Peipei
    Wang, Gaosheng
    Li, Hong
    Liu, Jie
    Ren, Yimo
    Zhu, Hongsong
    Sun, Limin
    INFORMATION PROCESSING & MANAGEMENT, 2024, 61 (01)
  • [27] MULTI-SCALE INTERACTIVE TRANSFORMER FOR REMOTE SENSING CROSS-MODAL IMAGE-TEXT RETRIEVAL
    Wang, Yijing
    Ma, Jingjing
    Li, Mingteng
    Tang, Xu
    Han, Xiao
    Jiao, Licheng
    2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 839 - 842
  • [28] Label Guided Discrete Hashing for Cross-Modal Retrieval
    Lan, Rushi
    Tan, Yu
    Wang, Xiaoqin
    Liu, Zhenbing
    Luo, Xiaonan
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (12) : 25236 - 25248
  • [29] Adversary Guided Asymmetric Hashing for Cross-Modal Retrieval
    Gu, Wen
    Gu, Xiaoyan
    Gu, Jingzi
    Li, Bo
    Xiong, Zhi
    Wang, Weiping
    ICMR'19: PROCEEDINGS OF THE 2019 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2019, : 159 - 167
  • [30] Semantics-Reconstructing Hashing for Cross-Modal Retrieval
    Zhang, Peng-Fei
    Huang, Zi
    Zhang, Zheng
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2020, PT II, 2020, 12085 : 315 - 327