Step-Wise Hierarchical Alignment Network for Image-Text Matching

被引:0
|
作者
Ji, Zhong [1 ]
Chen, Kexin [1 ]
Wang, Haoran [1 ]
机构
[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image-text matching plays a central role in bridging the semantic gap between vision and language. The key point to achieve precise visual-semantic alignment lies in capturing the fine-grained cross-modal correspondence between image and text. Most previous methods rely on single-step reasoning to discover the visual-semantic interactions, which lacks the ability of exploiting the multi-level information to locate the hierarchical fine-grained relevance. Different from them, in this work, we propose a step-wise hierarchical alignment network (SHAN) that decomposes image-text matching into multi-step cross-modal reasoning process. Specifically, we first achieve local-to-local alignment at fragment level, following by performing global-to-local and global-to-global alignment at context level sequentially. This progressive alignment strategy supplies our model with more complementary and sufficient semantic clues to understand the hierarchical correlations between image and text. The experimental results on two benchmark datasets demonstrate the superiority of our proposed method.
引用
收藏
页码:765 / 771
页数:7
相关论文
共 50 条
  • [31] Cross-modal Graph Matching Network for Image-text Retrieval
    Cheng, Yuhao
    Zhu, Xiaoguang
    Qian, Jiuchao
    Wen, Fei
    Liu, Peilin
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2022, 18 (04)
  • [32] Global-Guided Asymmetric Attention Network for Image-Text Matching
    Wu, Dongqing
    Li, Huihui
    Tang, Yinge
    Guo, Lei
    Liu, Hang
    [J]. Neurocomputing, 2022, 481 : 77 - 90
  • [33] CycleMatch: A cycle-consistent embedding network for image-text matching
    Liu, Yu
    Guo, Yanming
    Liu, Li
    Bakker, Erwin M.
    Lew, Michael S.
    [J]. PATTERN RECOGNITION, 2019, 93 : 365 - 379
  • [34] Image-text interaction graph neural network for image-text sentiment analysis
    Wenxiong Liao
    Bi Zeng
    Jianqi Liu
    Pengfei Wei
    Jiongkun Fang
    [J]. Applied Intelligence, 2022, 52 : 11184 - 11198
  • [35] Enhancing Separate Encoding with Multi-layer Feature Alignment for Image-Text Matching
    Wen, Keyu
    Li, Linyang
    Gu, Xiaodong
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT I, 2021, 12891 : 403 - 414
  • [36] Image-text interaction graph neural network for image-text sentiment analysis
    Liao, Wenxiong
    Zeng, Bi
    Liu, Jianqi
    Wei, Pengfei
    Fang, Jiongkun
    [J]. APPLIED INTELLIGENCE, 2022, 52 (10) : 11184 - 11198
  • [37] Similarity Reasoning and Filtration for Image-Text Matching
    Diao, Haiwen
    Zhang, Ying
    Ma, Lin
    Lu, Huchuan
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 1218 - 1226
  • [38] Asymmetric Polysemous Reasoning for Image-Text Matching
    Zhang, Hongping
    Yang, Ming
    [J]. 2023 23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW 2023, 2023, : 1013 - 1022
  • [39] Fusion layer attention for image-text matching
    Wang, Depeng
    Wang, Liejun
    Song, Shiji
    Huang, Gao
    Guo, Yuchen
    Cheng, Shuli
    Ao, Naixiang
    Du, Anyu
    [J]. NEUROCOMPUTING, 2021, 442 : 249 - 259
  • [40] Visual Semantic Reasoning for Image-Text Matching
    Li, Kunpeng
    Zhang, Yulun
    Li, Kai
    Li, Yuanyuan
    Fu, Yun
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4653 - 4661