Rare-aware attention network for image-text matching

被引:7
|
作者
Wang, Yan [1 ,2 ]
Su, Yuting [1 ]
Li, Wenhui [1 ]
Sun, Zhengya [3 ]
Wei, Zhiqiang [4 ]
Nie, Jie [4 ]
Li, Xuanya [5 ]
Liu, An-An [1 ,2 ]
机构
[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China
[2] Hefei Comprehens Natl Sci Ctr, Inst Artificial Intelligence, Hefei 230088, Peoples R China
[3] Chinese Acad Sci, Inst Automat, Beijing, Peoples R China
[4] Ocean Univ China, Qingdao, Shandong, Peoples R China
[5] Baidu Inc, Beijing, Peoples R China
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
Cross modal retrieval; Attention mechanism; Semantic alignment;
D O I
10.1016/j.ipm.2023.103280
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Image and text matching bridges visual and textual modality differences and plays a con-siderable role in cross-modal retrieval. Much progress has been achieved through semantic representation and alignment. However, the distribution of multimedia data is severely un-balanced and contains many low-frequency occurrences, which are often ignored and cause performance degradation, i.e., the long-tail effect. In this work, we propose a novel rare-aware attention network (RAAN), which explores and exploits textual rare content for tackling the long-tail effect of image and text matching. Specifically, we first design a rare-aware mining module, which contains global prior information construction and rare fragment detector for modeling the characteristic of rare content. Then, the rare attention matching utilizes prior information as attention to guide the representation enhancement of rare content and introduces the rareness representation to strengthen the similarity calculation. Finally, we design prior information loss to optimize the model together with the triplet loss. We perform quantitative and qualitative experiments on two large-scale databases and achieve leading performance. In particular, we conduct 0-shot test for rare content and improve rSum by 21.0 and 41.5 on Flickr30K (155,000 image and text pairs) and MSCOCO (616,435 image and text pairs), demonstrating the effectiveness of the proposed method for the long-tail effect.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Dual Relation-Aware Synergistic Attention Network for Image-Text Matching
    Qi, Shanshan
    Yang, Luxi
    Li, Chunguo
    Huang, Yongming
    [J]. 2022 11TH INTERNATIONAL CONFERENCE ON COMMUNICATIONS, CIRCUITS AND SYSTEMS (ICCCAS 2022), 2022, : 251 - 256
  • [2] Negative-Aware Attention Framework for Image-Text Matching
    Zhang, Kun
    Mao, Zhendong
    Wang, Quan
    Zhang, Yongdong
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 15640 - 15649
  • [3] Position Focused Attention Network for Image-Text Matching
    Wang, Yaxiong
    Yang, Hao
    Qian, Xueming
    Ma, Lin
    Lu, Jing
    Li, Biao
    Fan, Xin
    [J]. PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 3792 - 3798
  • [4] Reference-Aware Adaptive Network for Image-Text Matching
    Xiong G.
    Meng M.
    Zhang T.
    Zhang D.
    Zhang Y.
    [J]. IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34 (10) : 1 - 1
  • [5] Dual Semantic Relationship Attention Network for Image-Text Matching
    Wen, Keyu
    Gu, Xiaodong
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [6] Context-Aware Attention Network for Image-Text Retrieval
    Zhang, Qi
    Lei, Zhen
    Zhang, Zhaoxiang
    Li, Stan Z.
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 3533 - 3542
  • [7] Cross Attention Graph Matching Network for Image-Text Retrieval
    Yang, Xiaoyu
    Xie, Hao
    Mao, Junyi
    Wang, Zhiguo
    Yin, Guangqiang
    [J]. PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND NETWORKS, VOL II, CENET 2023, 2024, 1126 : 274 - 286
  • [8] Focus Your Attention: A Bidirectional Focal Attention Network for Image-Text Matching
    Liu, Chunxiao
    Mao, Zhendong
    Liu, An-An
    Zhang, Tianzhu
    Wang, Bin
    Zhang, Yongdong
    [J]. PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 3 - 11
  • [9] Unified Adaptive Relevance Distinguishable Attention Network for Image-Text Matching
    Zhang, Kun
    Mao, Zhendong
    Liu, An-An
    Zhang, Yongdong
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 1320 - 1332
  • [10] Global Relation-Aware Attention Network for Image-Text Retrieval
    Cao, Jie
    Qian, Shengsheng
    Zhang, Huaiwen
    Fang, Quan
    Xu, Changsheng
    [J]. PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, : 19 - 28