Scene Graph Semantic Inference for Image and Text Matching

被引:10
|
作者
Pei, Jiaming [1 ]
Zhong, Kaiyang [2 ]
Yu, Zhi [3 ]
Wang, Lukun [4 ]
Lakshmanna, Kuruva [5 ]
机构
[1] Univ Sydney, Sch Comp Sci, Sydney, NSW 2006, Australia
[2] Southwestern Univ Finance & Econ, Sch Comp & Artificial Intelligence, Sichuan 610030, Peoples R China
[3] Chongqing Univ, Sch Microelect & Commun Engn, Chongqing 40044, Peoples R China
[4] Shandong Univ Sci & Technol, Coll Intelligent equipment, Qingdao 271019, Peoples R China
[5] Vellore Inst Technol, Sch Informat Technol & Engn, Vellore, Tamil Nadu, India
关键词
Image and text matching; scene graph; semantic inference;
D O I
10.1145/3563390
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the rapid development of information technology, image and text data have increased dramatically. Image and text matching techniques enable computers to understand information from both visual and text modalities and match them based on semantic content. Existing methods focus on visual and textual object co-occurrence statistics and learning coarse-level associations. However, the lack of intramodal semantic inference leads to the failure of fine-level association between modalities. Scene graphs can capture the interactions between visual and textual objects and model intramodal semantic associations, which are crucial for the understanding of scenes contained in images and text. In this article, we propose a novel scene graph semantic inference network (SGSIN) for image and text matching that effectively learns fine-level semantic information in vision and text to facilitate bridging cross-modal discrepancies. Specifically, we design two matching modules and construct scene graphs within each matching module for aggregating neighborhood information to refine the semantic representation of each object and achieve fine-level alignment of visual and textualmodalities. We perform extended experiments in Flickr30K andMSCOCO and achieve state-of-the-art results, which validate the advantages of our proposed approach.
引用
收藏
页数:23
相关论文
共 50 条
  • [41] Local Alignment with Global Semantic Consistence Network for Image-Text Matching
    Li, Pengwei
    Wu, Shihua
    Lian, Zhichao
    2022 IEEE INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, INTL CONF ON CLOUD AND BIG DATA COMPUTING, INTL CONF ON CYBER SCIENCE AND TECHNOLOGY CONGRESS (DASC/PICOM/CBDCOM/CYBERSCITECH), 2022, : 652 - 657
  • [42] Cross-modal Semantic Interference Suppression for image-text matching
    Yao, Tao
    Peng, Shouyong
    Sun, Yujuan
    Sheng, Guorui
    Fu, Haiyan
    Kong, Xiangwei
    Engineering Applications of Artificial Intelligence, 2024, 133
  • [43] Multiple graph matching with Bayesian inference
    Williams, ML
    Wilson, RC
    Hancock, ER
    PATTERN RECOGNITION LETTERS, 1997, 18 (11-13) : 1275 - 1281
  • [44] Cross-modal Graph Matching Network for Image-text Retrieval
    Cheng, Yuhao
    Zhu, Xiaoguang
    Qian, Jiuchao
    Wen, Fei
    Liu, Peilin
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2022, 18 (04)
  • [45] SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning
    Wang, Zhecan
    You, Haoxuan
    Li, Liunian Harold
    Zareian, Alireza
    Park, Suji
    Liang, Yiqing
    Chang, Kai-Wei
    Chang, Shih-Fu
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 5914 - 5922
  • [46] Towards Explainable Semantic Text Matching
    Landthaler, Joerg
    Glaser, Ingo
    Matthes, Florian
    LEGAL KNOWLEDGE AND INFORMATION SYSTEMS (JURIX 2018), 2018, 313 : 200 - 204
  • [47] Visual and semantic guided scene text retrieval
    Luo, Hailong
    Ibrayim, Mayire
    Hamdulla, Askar
    Deng, Qilin
    JOURNAL OF SUPERCOMPUTING, 2024, 80 (14): : 21394 - 21411
  • [48] Conceptual graph matching for semantic search
    Zhong, JW
    Zhu, HP
    Li, JM
    Yu, Y
    CONCEPTUAL STRUCTURES: INTEGRATION AND INTERFACES, PROCEEDINGS, 2002, 2393 : 92 - 106
  • [49] Arabic Text Semantic Graph Representation
    Al Etaiwi, Wael Mahmoud
    Awajan, Arafat
    2019 2ND INTERNATIONAL CONFERENCE ON NEW TRENDS IN COMPUTING SCIENCES (ICTCS), 2019, : 265 - 270
  • [50] A randomized heuristic for scene recognition by graph matching
    Boeres, MC
    Ribeiro, CC
    Bloch, I
    EXPERIMENTAL AND EFFICIENT ALGORITHMS, 2004, 3059 : 100 - 113