Scene graph fusion and negative sample generation strategy for image-text matching

被引:0
|
作者
Wang, Liqin [1 ,2 ,3 ]
Yang, Pengcheng [1 ]
Wang, Xu [1 ,2 ,3 ]
Xu, Zhihong [1 ,2 ,3 ]
Dong, Yongfeng [1 ,2 ,3 ]
机构
[1] School of Artificial Intelligence and Data Science, Hebei University of Technology, Tianjin,300401, China
[2] Hebei Province Key Laboratory of Big Data Calculation, Tianjin,300401, China
[3] Hebei Data Driven Industrial Intelligent Engineering Research Center, Tianjin,300401, China
来源
Journal of Supercomputing | 2025年 / 81卷 / 01期
关键词
Semantics;
D O I
10.1007/s11227-024-06652-2
中图分类号
学科分类号
摘要
In the field of image-text matching, the scene graph-based approach is commonly employed to detect semantic associations between entities in cross-modal information, hence improving cross-modal interaction by capturing more fine-grained associations. However, the associations between images and texts are often implicitly modeled, resulting in a semantic gap between image and text information. To address the lack of cross-modal information integration and explicitly model fine-grained semantic information in images and texts, we propose a scene graph fusion and negative sample generation strategy for image-text matching(SGFNS). Furthermore, to enhance the expression ability of the insignificant features of similar images in image-text matching, we propose a negative sample generation strategy, and introduce an extra loss function to effectively incorporate negative samples to enhance the training process. In experiments, we verify the effectiveness of our model compared with current state-of-the-art models using scene graph directly. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.
引用
收藏
相关论文
共 50 条
  • [41] Transformer Reasoning Network for Image-Text Matching and Retrieval
    Messina, Nicola
    Falchi, Fabrizio
    Esuli, Andrea
    Amato, Giuseppe
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 5222 - 5229
  • [42] A NEIGHBOR-AWARE APPROACH FOR IMAGE-TEXT MATCHING
    Liu, Chunxiao
    Mao, Zhendong
    Zang, Wenyu
    Wang, Bin
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 3970 - 3974
  • [43] Plug-and-Play Regulators for Image-Text Matching
    Diao, Haiwen
    Zhang, Ying
    Liu, Wei
    Ruan, Xiang
    Lu, Huchuan
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 2322 - 2334
  • [44] Position Focused Attention Network for Image-Text Matching
    Wang, Yaxiong
    Yang, Hao
    Qian, Xueming
    Ma, Lin
    Lu, Jing
    Li, Biao
    Fan, Xin
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 3792 - 3798
  • [45] Synthesizing Counterfactual Samples for Effective Image-Text Matching
    Wei, Hao
    Wang, Shuhui
    Han, Xinzhe
    Xue, Zhe
    Ma, Bin
    Wei, Xiaoming
    Wei, Xiaolin
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4355 - 4364
  • [46] Generative label fused network for image-text matching
    Zhao, Guoshuai
    Zhang, Chaofeng
    Shang, Heng
    Wang, Yaxiong
    Zhu, Li
    Qian, Xueming
    KNOWLEDGE-BASED SYSTEMS, 2023, 263
  • [47] Text-Image Scene Graph Fusion for Multimodal Named Entity Recognition
    Cheng J.
    Long K.
    Zhang S.
    Zhang T.
    Ma L.
    Cheng S.
    Guo Y.
    IEEE Transactions on Artificial Intelligence, 2024, 5 (06): : 2828 - 2839
  • [48] Multi-view inter-modality representation with progressive fusion for image-text matching
    Wu, Jie
    Wang, Leiquan
    Chen, Chenglizhao
    Lu, Jing
    Wu, Chunlei
    NEUROCOMPUTING, 2023, 535 : 1 - 12
  • [49] Text-to-Image Generation Method Based on Image-Text Semantic Consistency
    Xue Z.
    Xu Z.
    Lang C.
    Feng S.
    Wang T.
    Li Y.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2023, 60 (09): : 2180 - 2190
  • [50] More Grounded Image Captioning by Distilling Image-Text Matching Model
    Zhou, Yuanen
    Wang, Meng
    Liu, Daqing
    Hu, Zhenzhen
    Zhang, Hanwang
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 4776 - 4785