Context-Aware Attention Network for Image-Text Retrieval

被引:181
|
作者
Zhang, Qi [1 ,2 ]
Lei, Zhen [1 ,2 ]
Zhang, Zhaoxiang [1 ,2 ]
Li, Stan Z. [3 ]
机构
[1] Chinese Acad Sci, Inst Automat, NLPR, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China
[3] Westlake Univ, Ctr AI Res & Innovat, Hangzhou, Peoples R China
关键词
D O I
10.1109/CVPR42600.2020.00359
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As a typical cross-modal problem, image-text bidirectional retrieval relies heavily on the joint embedding learning and similarity measure for each image-text pair. It remains challenging because prior works seldom explore semantic correspondences between modalities and semantic correlations in a single modality at the same time. In this work, we propose a unified Context-Aware Attention Network (CAAN), which selectively focuses on critical local fragments (regions and words) by aggregating the global context. Specifically, it simultaneously utilizes global inter-modal alignments and intra-modal correlations to discover latent semantic relations. Considering the interactions between images and sentences in the retrieval process, intra-modal correlations are derived from the second-order attention of region-word alignments instead of intuitively comparing the distance between original features. Our method achieves fairly competitive results on two generic image-text retrieval datasets Flickr30K and MS-COCO.
引用
收藏
页码:3533 / 3542
页数:10
相关论文
共 50 条
  • [1] Context-aware relation enhancement and similarity reasoning for image-text retrieval
    Cui, Zheng
    Hu, Yongli
    Sun, Yanfeng
    Yin, Baocai
    IET COMPUTER VISION, 2024, 18 (05) : 652 - 665
  • [2] Global Relation-Aware Attention Network for Image-Text Retrieval
    Cao, Jie
    Qian, Shengsheng
    Zhang, Huaiwen
    Fang, Quan
    Xu, Changsheng
    PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, : 19 - 28
  • [3] Context-Aware Multi-View Summarization Network for Image-Text Matching
    Qu, Leigang
    Liu, Meng
    Cao, Da
    Nie, Liqiang
    Tian, Qi
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1047 - 1055
  • [4] Context-aware attention network for image recognition
    Jiaxu Leng
    Ying Liu
    Shang Chen
    Neural Computing and Applications, 2019, 31 : 9295 - 9305
  • [5] Context-aware attention network for image recognition
    Leng, Jiaxu
    Liu, Ying
    Chen, Shang
    NEURAL COMPUTING & APPLICATIONS, 2019, 31 (12): : 9295 - 9305
  • [6] Rare-aware attention network for image-text matching
    Wang, Yan
    Su, Yuting
    Li, Wenhui
    Sun, Zhengya
    Wei, Zhiqiang
    Nie, Jie
    Li, Xuanya
    Liu, An-An
    INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (03)
  • [7] Cross Attention Graph Matching Network for Image-Text Retrieval
    Yang, Xiaoyu
    Xie, Hao
    Mao, Junyi
    Wang, Zhiguo
    Yin, Guangqiang
    PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND NETWORKS, VOL II, CENET 2023, 2024, 1126 : 274 - 286
  • [8] Global-aware Fragment Representation Aggregation Network for image-text retrieval
    Wang, Di
    Tian, Jiabo
    Liang, Xiao
    Tian, Yumin
    He, Lihuo
    PATTERN RECOGNITION, 2025, 159
  • [9] Flexible graph-based attention and pooling network for image-text retrieval
    Sun, Hao
    Qin, Xiaolin
    Liu, Xiaojing
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (19) : 57895 - 57912
  • [10] Context-aware Attention Network for Predicting Image Aesthetic Subjectivity
    Xu, Munan
    Zhong, Jia-Xing
    Ren, Yurui
    Liu, Shan
    Li, Ge
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 798 - 806