Multi-Granularity Interactive Transformer Hashing for Cross-modal Retrieval

被引：12

作者：

Liu, Yishu ^{[1
]}

Wu, Qingpeng ^{[1
]}

Zhang, Zheng ^{[1
,2
]}

Zhang, Jingyi ^{[3
]}

Lu, Guangming ^{[1
]}

机构：

[1] Harbin Inst Technol, Shenzhen, Peoples R China

[2] Peng Cheng Lab, Shenzhen, Peoples R China

[3] Huawei Technol Co Ltd, Hangzhou, Peoples R China

来源：

PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023 | 2023年

基金：

中国国家自然科学基金;

关键词：

Cross-modal hashing; cross-modal retrieval; multi-granularity; transformer; knowledge distillation; contrastive learning;

D O I：

10.1145/3581783.3612411

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

With the powerful representation ability and privileged efficiency, deep cross-modal hashing (DCMH) has become an emerging fast similarity search technique. Prior studies primarily focus on exploring pairwise similarities across modalities, but fail to comprehensively capture the multi-grained semantic correlations during intra- and inter-modal negotiation. To tackle this issue, this paper proposes a novel Multi-granularity Interactive Transformer Hashing (MITH) network, which hierarchically considers both coarse- and fine-grained similarity measurements across different modalities in one unified transformer-based framework. To the best of our knowledge, this is the first attempt for multi-granularity transformer-based cross-modal hashing. Specifically, a well-designed distilled intra-modal interaction module is deployed to excavate modality-specific concept knowledge with global-local knowledge distillation under the guidance of implicit conceptual category-level representations. Moreover, we construct a contrastive inter-modal alignment module to mine modality-independent semantic concept correspondences with instance- and token-wise contrastive learning, respectively. Such a collaborative learning paradigm can jointly alleviate the heterogeneity and semantic gaps among different modalities from a multi-granularity perspective, yielding discriminative modality-invariant hash codes. Extensive experiments on multiple representative cross-modal datasets demonstrate the consistent superiority of MITH over the existing state-of-the-art baselines. The codes are available at https://github.com/DarrenZZhang/MITH.

引用

页码：893 / 902

页数：10

共 50 条

[21] Kernelized Cross-Modal Hashing for Multimedia Retrieval
Tan, Shoubiao
Hu, Lingyu
Wang-Xu, Anqi
Tang, Jun
Jia, Zhaohong
PROCEEDINGS OF THE 2016 12TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA), 2016, : 1224 - 1228
[22] Multi-Level Correlation Adversarial Hashing for Cross-Modal Retrieval
Ma, Xinhong
Zhang, Tianzhu
Xu, Changsheng
IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (12) : 3101 - 3114
[23] Deep Multi-Level Semantic Hashing for Cross-Modal Retrieval
Ji, Zhenyan
Yao, Weina
Wei, Wei
Song, Houbing
Pi, Huaiyu
IEEE ACCESS, 2019, 7 : 23667 - 23674
[24] Graph Convolutional Multi-Label Hashing for Cross-Modal Retrieval
Shen, Xiaobo
Chen, Yinfan
Liu, Weiwei
Zheng, Yuhui
Sun, Quan-Sen
Pan, Shirui
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
[25] Transformer-Based Discriminative and Strong Representation Deep Hashing for Cross-Modal Retrieval
Zhou, Suqing
Han, Yu
Chen, Ning
Huang, Siyu
Igorevich, Kostromitin Konstantin
Luo, Jia
Zhang, Peiying
IEEE ACCESS, 2023, 11 : 140041 - 140055
[26] Multi-granularity cross-modal representation learning for named entity recognition on social media
Liu, Peipei
Wang, Gaosheng
Li, Hong
Liu, Jie
Ren, Yimo
Zhu, Hongsong
Sun, Limin
INFORMATION PROCESSING & MANAGEMENT, 2024, 61 (01)
[27] MULTI-SCALE INTERACTIVE TRANSFORMER FOR REMOTE SENSING CROSS-MODAL IMAGE-TEXT RETRIEVAL
Wang, Yijing
Ma, Jingjing
Li, Mingteng
Tang, Xu
Han, Xiao
Jiao, Licheng
2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 839 - 842
[28] Label Guided Discrete Hashing for Cross-Modal Retrieval
Lan, Rushi
Tan, Yu
Wang, Xiaoqin
Liu, Zhenbing
Luo, Xiaonan
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (12) : 25236 - 25248
[29] Adversary Guided Asymmetric Hashing for Cross-Modal Retrieval
Gu, Wen
Gu, Xiaoyan
Gu, Jingzi
Li, Bo
Xiong, Zhi
Wang, Weiping
ICMR'19: PROCEEDINGS OF THE 2019 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2019, : 159 - 167
[30] Semantics-Reconstructing Hashing for Cross-Modal Retrieval
Zhang, Peng-Fei
Huang, Zi
Zhang, Zheng
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2020, PT II, 2020, 12085 : 315 - 327

← 1 2 3 4 5 →