Scene Graph Semantic Inference for Image and Text Matching

被引:10
|
作者
Pei, Jiaming [1 ]
Zhong, Kaiyang [2 ]
Yu, Zhi [3 ]
Wang, Lukun [4 ]
Lakshmanna, Kuruva [5 ]
机构
[1] Univ Sydney, Sch Comp Sci, Sydney, NSW 2006, Australia
[2] Southwestern Univ Finance & Econ, Sch Comp & Artificial Intelligence, Sichuan 610030, Peoples R China
[3] Chongqing Univ, Sch Microelect & Commun Engn, Chongqing 40044, Peoples R China
[4] Shandong Univ Sci & Technol, Coll Intelligent equipment, Qingdao 271019, Peoples R China
[5] Vellore Inst Technol, Sch Informat Technol & Engn, Vellore, Tamil Nadu, India
关键词
Image and text matching; scene graph; semantic inference;
D O I
10.1145/3563390
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the rapid development of information technology, image and text data have increased dramatically. Image and text matching techniques enable computers to understand information from both visual and text modalities and match them based on semantic content. Existing methods focus on visual and textual object co-occurrence statistics and learning coarse-level associations. However, the lack of intramodal semantic inference leads to the failure of fine-level association between modalities. Scene graphs can capture the interactions between visual and textual objects and model intramodal semantic associations, which are crucial for the understanding of scenes contained in images and text. In this article, we propose a novel scene graph semantic inference network (SGSIN) for image and text matching that effectively learns fine-level semantic information in vision and text to facilitate bridging cross-modal discrepancies. Specifically, we design two matching modules and construct scene graphs within each matching module for aggregating neighborhood information to refine the semantic representation of each object and achieve fine-level alignment of visual and textualmodalities. We perform extended experiments in Flickr30K andMSCOCO and achieve state-of-the-art results, which validate the advantages of our proposed approach.
引用
收藏
页数:23
相关论文
共 50 条
  • [1] SEMScene: Semantic-Consistency Enhanced Multi-Level Scene Graph Matching for Image-Text Retrieval
    Liu, Yuankun
    Yuan, Xiang
    Li, Haochen
    Tan, Zhijie
    Huang, Jinsong
    Xiao, Jingjie
    Li, Weiping
    Mo, Tong
    [J]. ACM Transactions on Multimedia Computing, Communications and Applications, 2024, 20 (08)
  • [2] Scene Video Text Tracking With Graph Matching
    Pei, Wei-Yi
    Yang, Chun
    Meng, Li-Yu
    Hou, Jie-Bo
    Tian, Shu
    Yin, Xu-Cheng
    [J]. IEEE ACCESS, 2018, 6 : 19419 - 19426
  • [3] Dual-View Semantic Inference Network for image-text matching
    Wu, Chunlei
    Wu, Jie
    Cao, Haiwen
    Wei, Yiwei
    Wang, Leiquan
    [J]. NEUROCOMPUTING, 2021, 426 : 47 - 57
  • [4] Learning Dual Semantic Relations With Graph Attention for Image-Text Matching
    Wen, Keyu
    Gu, Xiaodong
    Cheng, Qingrong
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (07) : 2866 - 2879
  • [5] Scene graph fusion and negative sample generation strategy for image-text matching
    Wang, Liqin
    Yang, Pengcheng
    Wang, Xu
    Xu, Zhihong
    Dong, Yongfeng
    [J]. Journal of Supercomputing, 2025, 81 (01):
  • [6] A Deep Local and Global Scene-Graph Matching for Image-Text Retrieval
    Manh-Duy Nguyen
    Binh T Nguyen
    Cathal Gurrin
    [J]. NEW TRENDS IN INTELLIGENT SOFTWARE METHODOLOGIES, TOOLS AND TECHNIQUES, 2021, 337 : 510 - 523
  • [7] Remote Sensing Image Retrieval by Scene Semantic Matching
    Wang, Min
    Song, Tengyi
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2013, 51 (05): : 2874 - 2886
  • [8] Aligned visual semantic scene graph for image captioning
    Zhao, Shanshan
    Li, Lixiang
    Peng, Haipeng
    [J]. DISPLAYS, 2022, 74
  • [9] Cross-modal Scene Graph Matching for Relationship-aware Image-Text Retrieval
    Wang, Sijin
    Wang, Ruiping
    Yao, Ziwei
    Shan, Shiguang
    Chen, Xilin
    [J]. 2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 1497 - 1506
  • [10] IMAGE-TEXT MATCHING WITH SHARED SEMANTIC CONCEPTS
    Miao Lanxin
    [J]. 2022 19TH INTERNATIONAL COMPUTER CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (ICCWAMTIP), 2022,