Multi-Level Visual-Semantic Alignments with Relation-Wise Dual Attention Network for Image and Text Matching

被引:0
|
作者
Hu, Zhibin [1 ]
Luo, Yongsheng [1 ]
Lin, Jiong [1 ]
Yan, Yan [2 ]
Chen, Jian [1 ]
机构
[1] South China Univ Technol, Sch Software Engn, Guangzhou, Peoples R China
[2] Univ Iowa, Dept Comp Sci, Iowa City, IA 52242 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image-text matching is central to visual-semantic cross-modal retrieval and has been attracting extensive attention recently. Previous studies have been devoted to finding the latent correspondence between image regions and words, e.g., connecting key words to specific regions of salient objects. However, existing methods are usually committed to handle concrete objects, rather than abstract ones, e.g., a description of some action, which in fact are also ubiquitous in description texts of real-world. The main challenge in dealing with abstract objects is that there is no explicit connections between them, unlike their concrete counterparts. One therefore has to alternatively find the implicit and intrinsic connections between them. In this paper, we propose a relation-wise dual attention network (RDAN) for image-text matching. Specifically, we maintain an over-complete set that contains pairs of regions and words. Then built upon this set, we encode the local correlations and the global dependencies between regions and words by training a visual-semantic network. Then a dual pathway attention network is presented to infer the visual-semantic alignments and image-text similarity. Extensive experiments validate the efficacy of our method, by achieving the state-of-the-art performance on several public benchmark datasets.
引用
收藏
页码:789 / 795
页数:7
相关论文
共 50 条
  • [1] Multi-level Symmetric Semantic Alignment Network for image-text matching
    Wang, Wenzhuang
    Di, Xiaoguang
    Liu, Maozhen
    Gao, Feng
    [J]. NEUROCOMPUTING, 2024, 599
  • [2] Dual Semantic Relationship Attention Network for Image-Text Matching
    Wen, Keyu
    Gu, Xiaodong
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [3] A Multi-level Attention Model for Text Matching
    Sun, Qiang
    Wu, Yue
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2018, PT I, 2018, 11139 : 142 - 153
  • [4] Image-text matching algorithm based on multi-level semantic alignment
    Li, Yiru
    Yao, Tao
    Zhang, Linliang
    Sun, Yujuan
    Fu, Haiyan
    [J]. Beijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics, 2024, 50 (02): : 551 - 558
  • [5] Visual Relation Detection with Multi-Level Attention
    Zheng, Sipeng
    Chen, Shizhe
    Jin, Qin
    [J]. PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 121 - 129
  • [6] Dual Relation-Aware Synergistic Attention Network for Image-Text Matching
    Qi, Shanshan
    Yang, Luxi
    Li, Chunguo
    Huang, Yongming
    [J]. 2022 11TH INTERNATIONAL CONFERENCE ON COMMUNICATIONS, CIRCUITS AND SYSTEMS (ICCCAS 2022), 2022, : 251 - 256
  • [7] Learning Dual Semantic Relations With Graph Attention for Image-Text Matching
    Wen, Keyu
    Gu, Xiaodong
    Cheng, Qingrong
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (07) : 2866 - 2879
  • [8] Multi-level network based on transformer encoder for fine-grained image–text matching
    Lei Yang
    Yong Feng
    Mingliang Zhou
    Xiancai Xiong
    Yongheng Wang
    Baohua Qiang
    [J]. Multimedia Systems, 2023, 29 : 1981 - 1994
  • [9] Image Captioning with multi-level similarity-guided semantic matching
    Li, Jiesi
    Xu, Ning
    Nie, Weizhi
    Zhang, Shenyuan
    [J]. VISUAL INFORMATICS, 2021, 5 (04): : 41 - 48
  • [10] Multi-Relation Attention Network for Image Patch Matching
    Quan, Dou
    Wang, Shuang
    Li, Yi
    Yang, Bowu
    Huyan, Ning
    Chanussot, Jocelyn
    Hou, Biao
    Jiao, Licheng
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 7127 - 7142