Scene Graph Semantic Inference for Image and Text Matching

被引:10
|
作者
Pei, Jiaming [1 ]
Zhong, Kaiyang [2 ]
Yu, Zhi [3 ]
Wang, Lukun [4 ]
Lakshmanna, Kuruva [5 ]
机构
[1] Univ Sydney, Sch Comp Sci, Sydney, NSW 2006, Australia
[2] Southwestern Univ Finance & Econ, Sch Comp & Artificial Intelligence, Sichuan 610030, Peoples R China
[3] Chongqing Univ, Sch Microelect & Commun Engn, Chongqing 40044, Peoples R China
[4] Shandong Univ Sci & Technol, Coll Intelligent equipment, Qingdao 271019, Peoples R China
[5] Vellore Inst Technol, Sch Informat Technol & Engn, Vellore, Tamil Nadu, India
关键词
Image and text matching; scene graph; semantic inference;
D O I
10.1145/3563390
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the rapid development of information technology, image and text data have increased dramatically. Image and text matching techniques enable computers to understand information from both visual and text modalities and match them based on semantic content. Existing methods focus on visual and textual object co-occurrence statistics and learning coarse-level associations. However, the lack of intramodal semantic inference leads to the failure of fine-level association between modalities. Scene graphs can capture the interactions between visual and textual objects and model intramodal semantic associations, which are crucial for the understanding of scenes contained in images and text. In this article, we propose a novel scene graph semantic inference network (SGSIN) for image and text matching that effectively learns fine-level semantic information in vision and text to facilitate bridging cross-modal discrepancies. Specifically, we design two matching modules and construct scene graphs within each matching module for aggregating neighborhood information to refine the semantic representation of each object and achieve fine-level alignment of visual and textualmodalities. We perform extended experiments in Flickr30K andMSCOCO and achieve state-of-the-art results, which validate the advantages of our proposed approach.
引用
收藏
页数:23
相关论文
共 50 条
  • [21] Scene Graph based Fusion Network for Image-Text Retrieval
    Wang, Guoliang
    Shang, Yanlei
    Chen, Yong
    Zhen, Chaoqi
    Cheng, Dequan
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 138 - 143
  • [22] Scene Graph Driven Text-Prompt Generation for Image Inpainting
    Shukla, Tripti
    Maheshwari, Paridhi
    Singh, Rajhans
    Shukla, Ankita
    Kulkarni, Kuldeep
    Turaga, Pavan
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW, 2023, : 759 - 768
  • [23] Model-based inexact graph matching on top of DNNs for semantic scene understanding
    Chopin, Jeremy
    Fasquel, Jean-Baptiste
    Mouchere, Harold
    Dahyot, Rozenn
    Bloch, Isabelle
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 235
  • [24] Learning Semantic Relationship among Instances for Image-Text Matching
    Fu, Zheren
    Mao, Zhendong
    Song, Yan
    Zhang, Yongdong
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 15159 - 15168
  • [25] Dual Semantic Relationship Attention Network for Image-Text Matching
    Wen, Keyu
    Gu, Xiaodong
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [26] Enhanced Semantic Similarity Learning Framework for Image-Text Matching
    Zhang, Kun
    Hu, Bo
    Zhang, Huatian
    Li, Zhe
    Mao, Zhendong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (04) : 2973 - 2988
  • [27] Knowledge Aware Semantic Concept Expansion for Image-Text Matching
    Shi, Botian
    Ji, Lei
    Lu, Pan
    Niu, Zhendong
    Duan, Nan
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 5182 - 5189
  • [28] Towards Traffic Scene Description: The Semantic Scene Graph
    Zipfl, Maximilian
    Zoellner, J. Marius
    2022 IEEE 25TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC), 2022, : 3748 - 3755
  • [29] Multimodal graph inference network for scene graph generation
    Jingwen Duan
    Weidong Min
    Deyu Lin
    Jianfeng Xu
    Xin Xiong
    Applied Intelligence, 2021, 51 : 8768 - 8783
  • [30] Multimodal graph inference network for scene graph generation
    Duan, Jingwen
    Min, Weidong
    Lin, Deyu
    Xu, Jianfeng
    Xiong, Xin
    APPLIED INTELLIGENCE, 2021, 51 (12) : 8768 - 8783