Step-Wise Hierarchical Alignment Network for Image-Text Matching

被引:0
|
作者
Ji, Zhong [1 ]
Chen, Kexin [1 ]
Wang, Haoran [1 ]
机构
[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image-text matching plays a central role in bridging the semantic gap between vision and language. The key point to achieve precise visual-semantic alignment lies in capturing the fine-grained cross-modal correspondence between image and text. Most previous methods rely on single-step reasoning to discover the visual-semantic interactions, which lacks the ability of exploiting the multi-level information to locate the hierarchical fine-grained relevance. Different from them, in this work, we propose a step-wise hierarchical alignment network (SHAN) that decomposes image-text matching into multi-step cross-modal reasoning process. Specifically, we first achieve local-to-local alignment at fragment level, following by performing global-to-local and global-to-global alignment at context level sequentially. This progressive alignment strategy supplies our model with more complementary and sufficient semantic clues to understand the hierarchical correlations between image and text. The experimental results on two benchmark datasets demonstrate the superiority of our proposed method.
引用
收藏
页码:765 / 771
页数:7
相关论文
共 50 条
  • [1] HGAN: Hierarchical Graph Alignment Network for Image-Text Retrieval
    Guo, Jie
    Wang, Meiting
    Zhou, Yan
    Song, Bin
    Chi, Yuhao
    Fan, Wei
    Chang, Jianglong
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 9189 - 9202
  • [2] A Multiview Text Imagination Network Based on Latent Alignment for Image-Text Matching
    Shang, Heng
    Zhao, Guoshuai
    Shi, Jing
    Qian, Xueming
    [J]. IEEE INTELLIGENT SYSTEMS, 2023, 38 (03) : 41 - 50
  • [3] Local Alignment with Global Semantic Consistence Network for Image-Text Matching
    Li, Pengwei
    Wu, Shihua
    Lian, Zhichao
    [J]. 2022 IEEE INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, INTL CONF ON CLOUD AND BIG DATA COMPUTING, INTL CONF ON CYBER SCIENCE AND TECHNOLOGY CONGRESS (DASC/PICOM/CBDCOM/CYBERSCITECH), 2022, : 652 - 657
  • [4] HAAN: Learning a Hierarchical Adaptive Alignment Network for Image-Text Retrieval
    Wang, Shuhuai
    Liu, Zheng
    Pei, Xinlei
    Xu, Junhao
    [J]. SENSORS, 2023, 23 (05)
  • [5] Multi-level Symmetric Semantic Alignment Network for image-text matching
    Wang, Wenzhuang
    Di, Xiaoguang
    Liu, Maozhen
    Gao, Feng
    [J]. NEUROCOMPUTING, 2024, 599
  • [6] HIERARCHICAL ATTENTION IMAGE-TEXT ALIGNMENT NETWORK FOR PERSON RE-IDENTIFICATION
    Kansal, Kajal
    Subramanyam, A., V
    Wang, Zheng
    Satoh, Shinichi
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO WORKSHOPS (ICMEW), 2021,
  • [7] Learning hierarchical embedding space for image-text matching
    Sun, Hao
    Qin, Xiaolin
    Liu, Xiaojing
    [J]. INTELLIGENT DATA ANALYSIS, 2024, 28 (03) : 647 - 665
  • [8] Mutil-level Local Alignment and Semantic Matching Network for Image-Text Retrieval
    Jiang, Zhukai
    Lian, Zhichao
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT III, 2022, 13531 : 212 - 224
  • [9] Hierarchical Feature Aggregation Based on Transformer for Image-Text Matching
    Dong, Xinfeng
    Zhang, Huaxiang
    Zhu, Lei
    Nie, Liqiang
    Liu, Li
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (09) : 6437 - 6447
  • [10] Transformer Reasoning Network for Image-Text Matching and Retrieval
    Messina, Nicola
    Falchi, Fabrizio
    Esuli, Andrea
    Amato, Giuseppe
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 5222 - 5229