Context-Aware Attention Network for Image-Text Retrieval

被引:181
|
作者
Zhang, Qi [1 ,2 ]
Lei, Zhen [1 ,2 ]
Zhang, Zhaoxiang [1 ,2 ]
Li, Stan Z. [3 ]
机构
[1] Chinese Acad Sci, Inst Automat, NLPR, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China
[3] Westlake Univ, Ctr AI Res & Innovat, Hangzhou, Peoples R China
关键词
D O I
10.1109/CVPR42600.2020.00359
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As a typical cross-modal problem, image-text bidirectional retrieval relies heavily on the joint embedding learning and similarity measure for each image-text pair. It remains challenging because prior works seldom explore semantic correspondences between modalities and semantic correlations in a single modality at the same time. In this work, we propose a unified Context-Aware Attention Network (CAAN), which selectively focuses on critical local fragments (regions and words) by aggregating the global context. Specifically, it simultaneously utilizes global inter-modal alignments and intra-modal correlations to discover latent semantic relations. Considering the interactions between images and sentences in the retrieval process, intra-modal correlations are derived from the second-order attention of region-word alignments instead of intuitively comparing the distance between original features. Our method achieves fairly competitive results on two generic image-text retrieval datasets Flickr30K and MS-COCO.
引用
收藏
页码:3533 / 3542
页数:10
相关论文
共 50 条
  • [21] Low-light image dehazing network with aggregated context-aware attention
    Wang K.
    Cheng J.
    Huang S.
    Cai K.
    Wang W.
    Li Y.
    Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University, 2023, 50 (02): : 23 - 32
  • [22] Context-aware and co-attention network based image captioning model
    Sharma, Himanshu
    Srivastava, Swati
    IMAGING SCIENCE JOURNAL, 2023, 71 (03): : 244 - 256
  • [23] TFUN: Trilinear Fusion Network for Ternary Image-Text Retrieval
    Xu, Xing
    Sun, Jialiang
    Cao, Zuo
    Zhang, Yin
    Zhu, Xiaofeng
    Shen, Heng Tao
    INFORMATION FUSION, 2023, 91 : 327 - 337
  • [24] Dual Stream Relation Learning Network for Image-Text Retrieval
    Wu, Dongqing
    Li, Huihui
    Gu, Cang
    Guo, Lei
    Liu, Hang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2025, 27 : 1551 - 1565
  • [25] Reference-Aware Adaptive Network for Image-Text Matching
    Xiong, Guoxin
    Meng, Meng
    Zhang, Tianzhu
    Zhang, Dongming
    Zhang, Yongdong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (10) : 9678 - 9691
  • [26] Scene Graph based Fusion Network for Image-Text Retrieval
    Wang, Guoliang
    Shang, Yanlei
    Chen, Yong
    Zhen, Chaoqi
    Cheng, Dequan
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 138 - 143
  • [27] HGAN: Hierarchical Graph Alignment Network for Image-Text Retrieval
    Guo, Jie
    Wang, Meiting
    Zhou, Yan
    Song, Bin
    Chi, Yuhao
    Fan, Wei
    Chang, Jianglong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 9189 - 9202
  • [28] Visual context learning based on textual knowledge for image-text retrieval
    Qin, Yuzhuo
    Gu, Xiaodong
    Tan, Zhenshan
    NEURAL NETWORKS, 2022, 152 : 434 - 449
  • [29] Dual Semantic Relationship Attention Network for Image-Text Matching
    Wen, Keyu
    Gu, Xiaodong
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [30] Cross-Modal Remote Sensing Image-Text Retrieval via Context and Uncertainty-Aware Prompt
    Wang, Yijing
    Tang, Xu
    Ma, Jingjing
    Zhang, Xiangrong
    Liu, Fang
    Jiao, Licheng
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,