Context-Aware Attention Network for Image-Text Retrieval

被引:181
|
作者
Zhang, Qi [1 ,2 ]
Lei, Zhen [1 ,2 ]
Zhang, Zhaoxiang [1 ,2 ]
Li, Stan Z. [3 ]
机构
[1] Chinese Acad Sci, Inst Automat, NLPR, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China
[3] Westlake Univ, Ctr AI Res & Innovat, Hangzhou, Peoples R China
来源
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2020年
关键词
D O I
10.1109/CVPR42600.2020.00359
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As a typical cross-modal problem, image-text bidirectional retrieval relies heavily on the joint embedding learning and similarity measure for each image-text pair. It remains challenging because prior works seldom explore semantic correspondences between modalities and semantic correlations in a single modality at the same time. In this work, we propose a unified Context-Aware Attention Network (CAAN), which selectively focuses on critical local fragments (regions and words) by aggregating the global context. Specifically, it simultaneously utilizes global inter-modal alignments and intra-modal correlations to discover latent semantic relations. Considering the interactions between images and sentences in the retrieval process, intra-modal correlations are derived from the second-order attention of region-word alignments instead of intuitively comparing the distance between original features. Our method achieves fairly competitive results on two generic image-text retrieval datasets Flickr30K and MS-COCO.
引用
收藏
页码:3533 / 3542
页数:10
相关论文
共 50 条
  • [41] Context-aware media retrieval
    Mani, Ankur
    Sundaram, Hari
    IMAGE AND VIDEO RETRIEVAL, PROCEEDINGS, 2006, 4071 : 483 - 486
  • [42] A context-aware semantic modeling framework for efficient image retrieval
    Arun, K. S.
    Govindan, V. K.
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2017, 8 (04) : 1259 - 1285
  • [43] A context-aware semantic modeling framework for efficient image retrieval
    K. S. Arun
    V. K. Govindan
    International Journal of Machine Learning and Cybernetics, 2017, 8 : 1259 - 1285
  • [44] Prototype local-global alignment network for image-text retrieval
    Meng, Lingtao
    Zhang, Feifei
    Zhang, Xi
    Xu, Changsheng
    INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2022, 11 (04) : 525 - 538
  • [45] Cross-modal Graph Matching Network for Image-text Retrieval
    Cheng, Yuhao
    Zhu, Xiaoguang
    Qian, Jiuchao
    Wen, Fei
    Liu, Peilin
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2022, 18 (04)
  • [46] Sentiment and Context-Aware Hybrid DNN With Attention for Text Sentiment Classification
    Khan, Jawad
    Ahmad, Niaz
    Khalid, Shah
    Ali, Farman
    Lee, Youngmoon
    IEEE ACCESS, 2023, 11 : 28162 - 28179
  • [47] HAAN: Learning a Hierarchical Adaptive Alignment Network for Image-Text Retrieval
    Wang, Shuhuai
    Liu, Zheng
    Pei, Xinlei
    Xu, Junhao
    SENSORS, 2023, 23 (05)
  • [48] Cross-modal independent matching network for image-text retrieval
    Ke, Xiao
    Chen, Baitao
    Yang, Xiong
    Cai, Yuhang
    Liu, Hao
    Guo, Wenzhong
    PATTERN RECOGNITION, 2025, 159
  • [49] Learning visual relationship and context-aware attention for image captioning
    Wang, Junbo
    Wang, Wei
    Wang, Liang
    Wang, Zhiyong
    Feng, David Dagan
    Tan, Tieniu
    PATTERN RECOGNITION, 2020, 98
  • [50] EENet: embedding enhancement network for compositional image-text retrieval using generated text
    Hur, Chan
    Park, Hyeyoung
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (16) : 49689 - 49705