Context-Aware Attention Network for Image-Text Retrieval

被引:181
|
作者
Zhang, Qi [1 ,2 ]
Lei, Zhen [1 ,2 ]
Zhang, Zhaoxiang [1 ,2 ]
Li, Stan Z. [3 ]
机构
[1] Chinese Acad Sci, Inst Automat, NLPR, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China
[3] Westlake Univ, Ctr AI Res & Innovat, Hangzhou, Peoples R China
关键词
D O I
10.1109/CVPR42600.2020.00359
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As a typical cross-modal problem, image-text bidirectional retrieval relies heavily on the joint embedding learning and similarity measure for each image-text pair. It remains challenging because prior works seldom explore semantic correspondences between modalities and semantic correlations in a single modality at the same time. In this work, we propose a unified Context-Aware Attention Network (CAAN), which selectively focuses on critical local fragments (regions and words) by aggregating the global context. Specifically, it simultaneously utilizes global inter-modal alignments and intra-modal correlations to discover latent semantic relations. Considering the interactions between images and sentences in the retrieval process, intra-modal correlations are derived from the second-order attention of region-word alignments instead of intuitively comparing the distance between original features. Our method achieves fairly competitive results on two generic image-text retrieval datasets Flickr30K and MS-COCO.
引用
收藏
页码:3533 / 3542
页数:10
相关论文
共 50 条
  • [31] Semantic-Enhanced Attention Network for Image-Text Matching
    Zhou, Huanxiao
    Geng, Yushui
    Zhao, Jing
    Ma, Xishan
    PROCEEDINGS OF THE 2024 27 TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, CSCWD 2024, 2024, : 1256 - 1261
  • [32] A Context-aware Attention Network for Interactive Question Answering
    Li, Huayu
    Min, Martin Renqiang
    Ge, Yong
    Kadav, Asim
    KDD'17: PROCEEDINGS OF THE 23RD ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2017, : 927 - 935
  • [33] Hierarchical Attention Network for Context-Aware Query Suggestion
    Li, Xiangsheng
    Liu, Yiqun
    Li, Xin
    Luo, Cheng
    Nie, Jian-Yun
    Zhang, Min
    Ma, Shaoping
    INFORMATION RETRIEVAL TECHNOLOGY (AIRS 2018), 2018, 11292 : 173 - 186
  • [34] Context-aware pyramid attention network for crowd counting
    Lingyu Gu
    Chen Pang
    Yanjun Zheng
    Chen Lyu
    Lei Lyu
    Applied Intelligence, 2022, 52 : 6164 - 6180
  • [35] Context-Aware Attention LSTM Network for Flood Prediction
    Wu, Yirui
    Liu, Zhaoyang
    Xu, Weigang
    Feng, Jun
    Palaiahnakote, Shivakumara
    Lu, Tong
    2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 1301 - 1306
  • [36] Context-aware pyramid attention network for crowd counting
    Gu, Lingyu
    Pang, Chen
    Zheng, Yanjun
    Lyu, Chen
    Lyu, Lei
    APPLIED INTELLIGENCE, 2022, 52 (06) : 6164 - 6180
  • [37] PFAN plus plus : Bi-Directional Image-Text Retrieval With Position Focused Attention Network
    Wang, Yaxiong
    Yang, Hao
    Bai, Xiuxiu
    Qian, Xueming
    Ma, Lin
    Lu, Jing
    Li, Biao
    Fan, Xin
    IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 (23) : 3362 - 3376
  • [38] A Discriminative Convolutional Neural Network with Context-aware Attention
    Zhou, Yuxiang
    Liao, Lejian
    Gao, Yang
    Huang, Heyan
    Wei, Xiaochi
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2020, 11 (05)
  • [39] Graph Attention Network for Context-Aware Visual Tracking
    Shao, Yanyan
    Guo, Dongyan
    Cui, Ying
    Wang, Zhenhua
    Zhang, Liyan
    Zhang, Jianhua
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
  • [40] Focus Your Attention: A Bidirectional Focal Attention Network for Image-Text Matching
    Liu, Chunxiao
    Mao, Zhendong
    Liu, An-An
    Zhang, Tianzhu
    Wang, Bin
    Zhang, Yongdong
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 3 - 11