Step-Wise Hierarchical Alignment Network for Image-Text Matching

被引：0

作者：

Ji, Zhong ^{[1
]}

Chen, Kexin ^{[1
]}

Wang, Haoran ^{[1
]}

机构：

[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin, Peoples R China

来源：

PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021 | 2021年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Image-text matching plays a central role in bridging the semantic gap between vision and language. The key point to achieve precise visual-semantic alignment lies in capturing the fine-grained cross-modal correspondence between image and text. Most previous methods rely on single-step reasoning to discover the visual-semantic interactions, which lacks the ability of exploiting the multi-level information to locate the hierarchical fine-grained relevance. Different from them, in this work, we propose a step-wise hierarchical alignment network (SHAN) that decomposes image-text matching into multi-step cross-modal reasoning process. Specifically, we first achieve local-to-local alignment at fragment level, following by performing global-to-local and global-to-global alignment at context level sequentially. This progressive alignment strategy supplies our model with more complementary and sufficient semantic clues to understand the hierarchical correlations between image and text. The experimental results on two benchmark datasets demonstrate the superiority of our proposed method.

引用

下载

页码：765 / 771

页数：7

共 50 条

[21] Image-text matching algorithm based on multi-level semantic alignment
Li Y.
Yao T.
Zhang L.
Sun Y.
Fu H.
Beijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics, 2024, 50 (02): : 551 - 558
[22] Prototype local-global alignment network for image-text retrieval
Meng, Lingtao
Zhang, Feifei
Zhang, Xi
Xu, Changsheng
INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2022, 11 (04) : 525 - 538
[23] Team HUGE: Image-Text Matching via Hierarchical and Unified Graph Enhancing
Li, Bo
Wu, You
Li, Zhixin
PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 704 - 712
[24] Unified Adaptive Relevance Distinguishable Attention Network for Image-Text Matching
Zhang, Kun
Mao, Zhendong
Liu, An-An
Zhang, Yongdong
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 1320 - 1332
[25] Alignment of Image-Text and Video-Text Datasets
Ozkose, Yunus Emre
Gokce, Zeynep
Duygulu, Pinar
2023 31ST SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU, 2023,
[26] Multi-scale motivated neural network for image-text matching
Qin, Xueyang
Li, Lishuang
Pang, Guangyao
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (2) : 4383 - 4407
[27] Dual-View Semantic Inference Network for image-text matching
Wu, Chunlei
Wu, Jie
Cao, Haiwen
Wei, Yiwei
Wang, Leiquan
NEUROCOMPUTING, 2021, 426 : 47 - 57
[28] Cross-modal Semantically Augmented Network for Image-text Matching
Yao, Tao
Li, Yiru
Li, Ying
Zhu, Yingying
Wang, Gang
Yue, Jun
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (04)
[29] Global-Guided Asymmetric Attention Network for Image-Text Matching
Wu, Dongqing
Li, Huihui
Tang, Yinge
Guo, Lei
Liu, Hang
NEUROCOMPUTING, 2022, 481 : 77 - 90
[30] Multi-scale motivated neural network for image-text matching
Xueyang Qin
Lishuang Li
Guangyao Pang
Multimedia Tools and Applications, 2024, 83 : 4383 - 4407

← 1 2 3 4 5 →