Unifying knowledge iterative dissemination and relational reconstruction network for image-text matching

被引:17
|
作者
Xie, Xiumin [1 ]
Li, Zhixin [1 ]
Tang, Zhenjun [1 ]
Yao, Dan [1 ]
Ma, Huifang [2 ]
机构
[1] Guangxi Normal Univ, Guangxi Key Lab Multisource Informat Min & Secur, Guilin 541004, Peoples R China
[2] Northwest Normal Univ, Coll Comp Sci & Engn, Lanzhou 730070, Peoples R China
基金
中国国家自然科学基金;
关键词
Image-text matching; Semantic knowledge; Similarity representation learning; Similarity-relation learning; Graph neural network; ATTENTION;
D O I
10.1016/j.ipm.2022.103154
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Image-text matching is a crucial branch in multimedia retrieval which relies on learning inter-modal correspondences. Most existing methods focus on global or local correspondence and fail to explore fine-grained global-local alignment. Moreover, the issue of how to infer more accurate similarity scores remains unresolved. In this study, we propose a novel unifying knowledge iterative dissemination and relational reconstruction (KIDRR) network for image-text matching. Particularly, the knowledge graph iterative dissemination module is designed to iteratively broadcast global semantic knowledge, enabling relevant nodes to be associated, resulting in fine-grained intra-modal correlations and features. Hence, vectorbased similarity representations are learned from multiple perspectives to model multi-level alignments comprehensively. The relation graph reconstruction module is further developed to enhance cross-modal correspondences by constructing similarity relation graphs and adaptively reconstructing them. We conducted experiments on the datasets Flickr30K and MSCOCO, which have 31,783 and 123,287 images, respectively. Experiments show that KIDRR achieves improvements of nearly 2.2% and 1.6% relative to Recall@1 on Flicr30K and MSCOCO, respectively, compared to the current state-of-the-art baselines.
引用
收藏
页数:16
相关论文
共 50 条
  • [31] Asymmetric Polysemous Reasoning for Image-Text Matching
    Zhang, Hongping
    Yang, Ming
    2023 23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW 2023, 2023, : 1013 - 1022
  • [32] Fusion layer attention for image-text matching
    Wang, Depeng
    Wang, Liejun
    Song, Shiji
    Huang, Gao
    Guo, Yuchen
    Cheng, Shuli
    Ao, Naixiang
    Du, Anyu
    NEUROCOMPUTING, 2021, 442 : 249 - 259
  • [33] Visual Semantic Reasoning for Image-Text Matching
    Li, Kunpeng
    Zhang, Yulun
    Li, Kai
    Li, Yuanyuan
    Fu, Yun
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4653 - 4661
  • [34] Stacked Cross Attention for Image-Text Matching
    Lee, Kuang-Huei
    Chen, Xi
    Hua, Gang
    Hu, Houdong
    He, Xiaodong
    COMPUTER VISION - ECCV 2018, PT IV, 2018, 11208 : 212 - 228
  • [35] IMAGE-TEXT MATCHING WITH SHARED SEMANTIC CONCEPTS
    Miao Lanxin
    2022 19TH INTERNATIONAL COMPUTER CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (ICCWAMTIP), 2022,
  • [36] Multi-level Symmetric Semantic Alignment Network for image-text matching
    Wang, Wenzhuang
    Di, Xiaoguang
    Liu, Maozhen
    Gao, Feng
    NEUROCOMPUTING, 2024, 599
  • [37] Giving Text More Imagination Space for Image-text Matching
    Dong, Xinfeng
    Han, Longfei
    Zhang, Dingwen
    Liu, Li
    Han, Junwei
    Zhang, Huaxiang
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 6359 - 6368
  • [38] Multi-Modal Memory Enhancement Attention Network for Image-Text Matching
    Ji, Zhong
    Lin, Zhigang
    Wang, Haoran
    He, Yuqing
    IEEE ACCESS, 2020, 8 : 38438 - 38447
  • [39] Dual Relation-Aware Synergistic Attention Network for Image-Text Matching
    Qi, Shanshan
    Yang, Luxi
    Li, Chunguo
    Huang, Yongming
    2022 11TH INTERNATIONAL CONFERENCE ON COMMUNICATIONS, CIRCUITS AND SYSTEMS (ICCCAS 2022), 2022, : 251 - 256
  • [40] Focus Your Attention: A Bidirectional Focal Attention Network for Image-Text Matching
    Liu, Chunxiao
    Mao, Zhendong
    Liu, An-An
    Zhang, Tianzhu
    Wang, Bin
    Zhang, Yongdong
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 3 - 11