Scene Graph Semantic Inference for Image and Text Matching

被引:10
|
作者
Pei, Jiaming [1 ]
Zhong, Kaiyang [2 ]
Yu, Zhi [3 ]
Wang, Lukun [4 ]
Lakshmanna, Kuruva [5 ]
机构
[1] Univ Sydney, Sch Comp Sci, Sydney, NSW 2006, Australia
[2] Southwestern Univ Finance & Econ, Sch Comp & Artificial Intelligence, Sichuan 610030, Peoples R China
[3] Chongqing Univ, Sch Microelect & Commun Engn, Chongqing 40044, Peoples R China
[4] Shandong Univ Sci & Technol, Coll Intelligent equipment, Qingdao 271019, Peoples R China
[5] Vellore Inst Technol, Sch Informat Technol & Engn, Vellore, Tamil Nadu, India
关键词
Image and text matching; scene graph; semantic inference;
D O I
10.1145/3563390
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the rapid development of information technology, image and text data have increased dramatically. Image and text matching techniques enable computers to understand information from both visual and text modalities and match them based on semantic content. Existing methods focus on visual and textual object co-occurrence statistics and learning coarse-level associations. However, the lack of intramodal semantic inference leads to the failure of fine-level association between modalities. Scene graphs can capture the interactions between visual and textual objects and model intramodal semantic associations, which are crucial for the understanding of scenes contained in images and text. In this article, we propose a novel scene graph semantic inference network (SGSIN) for image and text matching that effectively learns fine-level semantic information in vision and text to facilitate bridging cross-modal discrepancies. Specifically, we design two matching modules and construct scene graphs within each matching module for aggregating neighborhood information to refine the semantic representation of each object and achieve fine-level alignment of visual and textualmodalities. We perform extended experiments in Flickr30K andMSCOCO and achieve state-of-the-art results, which validate the advantages of our proposed approach.
引用
收藏
页数:23
相关论文
共 50 条
  • [31] Semantic image segmentation based on spatial relationships and inexact graph matching
    Chopin, Jeremy
    Fasquel, Jean-Baptiste
    Mouchere, Harold
    Dahyot, Rozenn
    Bloch, Isabelle
    2020 TENTH INTERNATIONAL CONFERENCE ON IMAGE PROCESSING THEORY, TOOLS AND APPLICATIONS (IPTA), 2020,
  • [32] Adaptive Latent Graph Representation Learning for Image-Text Matching
    Tian, Mengxiao
    Wu, Xinxiao
    Jia, Yunde
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 (471-482) : 471 - 482
  • [33] Cross Attention Graph Matching Network for Image-Text Retrieval
    Yang, Xiaoyu
    Xie, Hao
    Mao, Junyi
    Wang, Zhiguo
    Yin, Guangqiang
    PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND NETWORKS, VOL II, CENET 2023, 2024, 1126 : 274 - 286
  • [34] Text-Image Scene Graph Fusion for Multimodal Named Entity Recognition
    Cheng J.
    Long K.
    Zhang S.
    Zhang T.
    Ma L.
    Cheng S.
    Guo Y.
    IEEE Transactions on Artificial Intelligence, 2024, 5 (06): : 2828 - 2839
  • [35] Identification of Necessary Semantic Undertakers in the Causal View for Image-Text Matching
    Zhang, Huatian
    Zhang, Lei
    Zhang, Kun
    Mao, Zhendong
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7, 2024, : 7105 - 7114
  • [36] HAL: Improved Text-Image Matching by Mitigating Visual Semantic Hubs
    Liu, Fangyu
    Ye, Rongtian
    Wang, Xun
    Li, Shuaipeng
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 11563 - 11571
  • [37] A method for image-text matching based on semantic filtering and adaptive adjustment
    Jin, Ran
    Hou, Tengda
    Jin, Tao
    Yuan, Jie
    Du, Chenjie
    EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2024, 2024 (01)
  • [38] Cross-Modal Attention With Semantic Consistence for Image-Text Matching
    Xu, Xing
    Wang, Tan
    Yang, Yang
    Zuo, Lin
    Shen, Fumin
    Shen, Heng Tao
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (12) : 5412 - 5425
  • [39] Regularizing Visual Semantic Embedding With Contrastive Learning for Image-Text Matching
    Liu, Yang
    Liu, Hong
    Wang, Huaqiu
    Liu, Mengyuan
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 1332 - 1336
  • [40] Cross-modal Semantic Interference Suppression for image-text matching
    Yao, Tao
    Peng, Shouyong
    Sun, Yujuan
    Sheng, Guorui
    Fu, Haiyan
    Kong, Xiangwei
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133