Multi-Granularity Interactive Transformer Hashing for Cross-modal Retrieval

被引：12

作者：

Liu, Yishu ^{[1
]}

Wu, Qingpeng ^{[1
]}

Zhang, Zheng ^{[1
,2
]}

Zhang, Jingyi ^{[3
]}

Lu, Guangming ^{[1
]}

机构：

[1] Harbin Inst Technol, Shenzhen, Peoples R China

[2] Peng Cheng Lab, Shenzhen, Peoples R China

[3] Huawei Technol Co Ltd, Hangzhou, Peoples R China

来源：

PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023 | 2023年

基金：

中国国家自然科学基金;

关键词：

Cross-modal hashing; cross-modal retrieval; multi-granularity; transformer; knowledge distillation; contrastive learning;

D O I：

10.1145/3581783.3612411

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

With the powerful representation ability and privileged efficiency, deep cross-modal hashing (DCMH) has become an emerging fast similarity search technique. Prior studies primarily focus on exploring pairwise similarities across modalities, but fail to comprehensively capture the multi-grained semantic correlations during intra- and inter-modal negotiation. To tackle this issue, this paper proposes a novel Multi-granularity Interactive Transformer Hashing (MITH) network, which hierarchically considers both coarse- and fine-grained similarity measurements across different modalities in one unified transformer-based framework. To the best of our knowledge, this is the first attempt for multi-granularity transformer-based cross-modal hashing. Specifically, a well-designed distilled intra-modal interaction module is deployed to excavate modality-specific concept knowledge with global-local knowledge distillation under the guidance of implicit conceptual category-level representations. Moreover, we construct a contrastive inter-modal alignment module to mine modality-independent semantic concept correspondences with instance- and token-wise contrastive learning, respectively. Such a collaborative learning paradigm can jointly alleviate the heterogeneity and semantic gaps among different modalities from a multi-granularity perspective, yielding discriminative modality-invariant hash codes. Extensive experiments on multiple representative cross-modal datasets demonstrate the consistent superiority of MITH over the existing state-of-the-art baselines. The codes are available at https://github.com/DarrenZZhang/MITH.

引用

页码：893 / 902

页数：10

共 50 条

[1] Multi-Granularity Semantic Information Integration Graph for Cross-Modal Hash Retrieval
Han, Zhichao
Bin Azman, Azreen
Khalid, Fatimah Binti
Mustaffa, Mas Rina Binti
IEEE ACCESS, 2024, 12 : 44682 - 44694
[2] Hugs Bring Double Benefits: Unsupervised Cross-Modal Hashing with Multi-granularity Aligned Transformers
Wang, Jinpeng
Zeng, Ziyun
Chen, Bin
Wang, Yuting
Liao, Dongliang
Li, Gongfu
Wang, Yiru
Xia, Shu-Tao
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (08) : 2765 - 2797
[3] Unsupervised Multi-modal Hashing for Cross-Modal Retrieval
Yu, Jun
Wu, Xiao-Jun
Zhang, Donglin
COGNITIVE COMPUTATION, 2022, 14 (03) : 1159 - 1171
[4] Unsupervised Multi-modal Hashing for Cross-Modal Retrieval
Jun Yu
Xiao-Jun Wu
Donglin Zhang
Cognitive Computation, 2022, 14 : 1159 - 1171
[5] Semantic-alignment transformer and adversary hashing for cross-modal retrieval
Sun, Yajun
Wang, Meng
Ma, Ying
APPLIED INTELLIGENCE, 2024, 54 (17-18) : 7581 - 7602
[6] Discriminant Adversarial Hashing Transformer for Cross-modal Vessel Image Retrieval
Guan X.
Guo J.
Lu Y.
Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology, 2023, 45 (12): : 4411 - 4420
[7] Contrastive Transformer Cross-Modal Hashing for Video-Text Retrieval
Shen, Xiaobo
Huang, Qianxin
Lan, Long
Zheng, Yuhui
PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 1227 - 1235
[8] Hashing for Cross-Modal Similarity Retrieval
Liu, Yao
Yuan, Yanhong
Huang, Qiaoli
Huang, Zhixing
2015 11TH INTERNATIONAL CONFERENCE ON SEMANTICS, KNOWLEDGE AND GRIDS (SKG), 2015, : 1 - 8
[9] Visual Question Generation Under Multi-granularity Cross-Modal Interaction
Chai, Zi
Wan, Xiaojun
Han, Soyeon Caren
Poon, Josiah
MULTIMEDIA MODELING, MMM 2023, PT I, 2023, 13833 : 255 - 266
[10] ROBUST MULTI-VIEW HASHING FOR CROSS-MODAL RETRIEVAL
Wang, Haitao
Chen, Hui
Meng, Min
Wu, JiGang
2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 1012 - 1017

← 1 2 3 4 5 →