Context-Aware Attention Network for Image-Text Retrieval

被引：181

作者：

Zhang, Qi ^{[1
,2
]}

Lei, Zhen ^{[1
,2
]}

Zhang, Zhaoxiang ^{[1
,2
]}

Li, Stan Z. ^{[3
]}

机构：

[1] Chinese Acad Sci, Inst Automat, NLPR, Beijing, Peoples R China

[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China

[3] Westlake Univ, Ctr AI Res & Innovat, Hangzhou, Peoples R China

来源：

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2020年

关键词：

D O I：

10.1109/CVPR42600.2020.00359

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

As a typical cross-modal problem, image-text bidirectional retrieval relies heavily on the joint embedding learning and similarity measure for each image-text pair. It remains challenging because prior works seldom explore semantic correspondences between modalities and semantic correlations in a single modality at the same time. In this work, we propose a unified Context-Aware Attention Network (CAAN), which selectively focuses on critical local fragments (regions and words) by aggregating the global context. Specifically, it simultaneously utilizes global inter-modal alignments and intra-modal correlations to discover latent semantic relations. Considering the interactions between images and sentences in the retrieval process, intra-modal correlations are derived from the second-order attention of region-word alignments instead of intuitively comparing the distance between original features. Our method achieves fairly competitive results on two generic image-text retrieval datasets Flickr30K and MS-COCO.

引用

页码：3533 / 3542

页数：10

共 50 条

[41] Context-aware media retrieval
Mani, Ankur
Sundaram, Hari
IMAGE AND VIDEO RETRIEVAL, PROCEEDINGS, 2006, 4071 : 483 - 486
[42] A context-aware semantic modeling framework for efficient image retrieval
Arun, K. S.
Govindan, V. K.
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2017, 8 (04) : 1259 - 1285
[43] A context-aware semantic modeling framework for efficient image retrieval
K. S. Arun
V. K. Govindan
International Journal of Machine Learning and Cybernetics, 2017, 8 : 1259 - 1285
[44] Prototype local-global alignment network for image-text retrieval
Meng, Lingtao
Zhang, Feifei
Zhang, Xi
Xu, Changsheng
INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2022, 11 (04) : 525 - 538
[45] Cross-modal Graph Matching Network for Image-text Retrieval
Cheng, Yuhao
Zhu, Xiaoguang
Qian, Jiuchao
Wen, Fei
Liu, Peilin
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2022, 18 (04)
[46] Sentiment and Context-Aware Hybrid DNN With Attention for Text Sentiment Classification
Khan, Jawad
Ahmad, Niaz
Khalid, Shah
Ali, Farman
Lee, Youngmoon
IEEE ACCESS, 2023, 11 : 28162 - 28179
[47] HAAN: Learning a Hierarchical Adaptive Alignment Network for Image-Text Retrieval
Wang, Shuhuai
Liu, Zheng
Pei, Xinlei
Xu, Junhao
SENSORS, 2023, 23 (05)
[48] Cross-modal independent matching network for image-text retrieval
Ke, Xiao
Chen, Baitao
Yang, Xiong
Cai, Yuhang
Liu, Hao
Guo, Wenzhong
PATTERN RECOGNITION, 2025, 159
[49] Learning visual relationship and context-aware attention for image captioning
Wang, Junbo
Wang, Wei
Wang, Liang
Wang, Zhiyong
Feng, David Dagan
Tan, Tieniu
PATTERN RECOGNITION, 2020, 98
[50] EENet: embedding enhancement network for compositional image-text retrieval using generated text
Hur, Chan
Park, Hyeyoung
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (16) : 49689 - 49705

← 1 2 3 4 5 →