Context-Aware Attention Network for Image-Text Retrieval

被引：181

作者：

Zhang, Qi ^{[1
,2
]}

Lei, Zhen ^{[1
,2
]}

Zhang, Zhaoxiang ^{[1
,2
]}

Li, Stan Z. ^{[3
]}

机构：

[1] Chinese Acad Sci, Inst Automat, NLPR, Beijing, Peoples R China

[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China

[3] Westlake Univ, Ctr AI Res & Innovat, Hangzhou, Peoples R China

来源：

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2020年

关键词：

D O I：

10.1109/CVPR42600.2020.00359

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

As a typical cross-modal problem, image-text bidirectional retrieval relies heavily on the joint embedding learning and similarity measure for each image-text pair. It remains challenging because prior works seldom explore semantic correspondences between modalities and semantic correlations in a single modality at the same time. In this work, we propose a unified Context-Aware Attention Network (CAAN), which selectively focuses on critical local fragments (regions and words) by aggregating the global context. Specifically, it simultaneously utilizes global inter-modal alignments and intra-modal correlations to discover latent semantic relations. Considering the interactions between images and sentences in the retrieval process, intra-modal correlations are derived from the second-order attention of region-word alignments instead of intuitively comparing the distance between original features. Our method achieves fairly competitive results on two generic image-text retrieval datasets Flickr30K and MS-COCO.

引用

页码：3533 / 3542

页数：10

共 50 条

[21] Low-light image dehazing network with aggregated context-aware attention
Wang K.
Cheng J.
Huang S.
Cai K.
Wang W.
Li Y.
Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University, 2023, 50 (02): : 23 - 32
[22] Context-aware and co-attention network based image captioning model
Sharma, Himanshu
Srivastava, Swati
IMAGING SCIENCE JOURNAL, 2023, 71 (03): : 244 - 256
[23] TFUN: Trilinear Fusion Network for Ternary Image-Text Retrieval
Xu, Xing
Sun, Jialiang
Cao, Zuo
Zhang, Yin
Zhu, Xiaofeng
Shen, Heng Tao
INFORMATION FUSION, 2023, 91 : 327 - 337
[24] Dual Stream Relation Learning Network for Image-Text Retrieval
Wu, Dongqing
Li, Huihui
Gu, Cang
Guo, Lei
Liu, Hang
IEEE TRANSACTIONS ON MULTIMEDIA, 2025, 27 : 1551 - 1565
[25] Reference-Aware Adaptive Network for Image-Text Matching
Xiong, Guoxin
Meng, Meng
Zhang, Tianzhu
Zhang, Dongming
Zhang, Yongdong
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (10) : 9678 - 9691
[26] Scene Graph based Fusion Network for Image-Text Retrieval
Wang, Guoliang
Shang, Yanlei
Chen, Yong
Zhen, Chaoqi
Cheng, Dequan
2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 138 - 143
[27] HGAN: Hierarchical Graph Alignment Network for Image-Text Retrieval
Guo, Jie
Wang, Meiting
Zhou, Yan
Song, Bin
Chi, Yuhao
Fan, Wei
Chang, Jianglong
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 9189 - 9202
[28] Visual context learning based on textual knowledge for image-text retrieval
Qin, Yuzhuo
Gu, Xiaodong
Tan, Zhenshan
NEURAL NETWORKS, 2022, 152 : 434 - 449
[29] Dual Semantic Relationship Attention Network for Image-Text Matching
Wen, Keyu
Gu, Xiaodong
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
[30] Cross-Modal Remote Sensing Image-Text Retrieval via Context and Uncertainty-Aware Prompt
Wang, Yijing
Tang, Xu
Ma, Jingjing
Zhang, Xiangrong
Liu, Fang
Jiao, Licheng
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,

← 1 2 3 4 5 →