Scene Graph based Fusion Network for Image-Text Retrieval

被引:0
|
作者
Wang, Guoliang [1 ]
Shang, Yanlei [1 ]
Chen, Yong [1 ]
Zhen, Chaoqi [1 ]
Cheng, Dequan [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Inst Network Technol, Beijing, Peoples R China
关键词
Scene Graph; Hierarchical Attention; Contextual Vectors; Image-Text Retrieval;
D O I
10.1109/ICME55011.2023.00032
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A critical challenge to image-text retrieval is how to learn accurate correspondences between images and texts. Most existing methods mainly focus on coarse-grained correspondences based on co-occurrences of semantic objects, while failing to distinguish the fine-grained local correspondences. In this paper, we propose a novel Scene Graph based Fusion Network (dubbed SGFN), which enhances the images'/texts' features through intra- and cross-modal fusion for image-text retrieval. To be specific, we design an intra-modal hierarchical attention fusion to incorporate semantic contexts, such as objects, attributes, and relationships, into images'/texts' feature vectors via scene graphs, and a cross-modal attention fusion to combine the contextual semantics and local fusion via contextual vectors. Extensive experiments on public datasets Flickr30K and MSCOCO show that our SGFN performs better than quite a few SOTA image-text retrieval methods.
引用
收藏
页码:138 / 143
页数:6
相关论文
共 50 条
  • [1] Heterogeneous Graph Fusion Network for cross-modal image-text retrieval
    Qin, Xueyang
    Li, Lishuang
    Pang, Guangyao
    Hao, Fei
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 249
  • [2] Flexible graph-based attention and pooling network for image-text retrieval
    Sun, Hao
    Qin, Xiaolin
    Liu, Xiaojing
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (19) : 57895 - 57912
  • [3] HGAN: Hierarchical Graph Alignment Network for Image-Text Retrieval
    Guo, Jie
    Wang, Meiting
    Zhou, Yan
    Song, Bin
    Chi, Yuhao
    Fan, Wei
    Chang, Jianglong
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 9189 - 9202
  • [4] Cross Attention Graph Matching Network for Image-Text Retrieval
    Yang, Xiaoyu
    Xie, Hao
    Mao, Junyi
    Wang, Zhiguo
    Yin, Guangqiang
    [J]. PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND NETWORKS, VOL II, CENET 2023, 2024, 1126 : 274 - 286
  • [5] TFUN: Trilinear Fusion Network for Ternary Image-Text Retrieval
    Xu, Xing
    Sun, Jialiang
    Cao, Zuo
    Zhang, Yin
    Zhu, Xiaofeng
    Shen, Heng Tao
    [J]. INFORMATION FUSION, 2023, 91 (327-337) : 327 - 337
  • [6] A Deep Local and Global Scene-Graph Matching for Image-Text Retrieval
    Manh-Duy Nguyen
    Binh T Nguyen
    Cathal Gurrin
    [J]. NEW TRENDS IN INTELLIGENT SOFTWARE METHODOLOGIES, TOOLS AND TECHNIQUES, 2021, 337 : 510 - 523
  • [7] Cross-modal Graph Matching Network for Image-text Retrieval
    Cheng, Yuhao
    Zhu, Xiaoguang
    Qian, Jiuchao
    Wen, Fei
    Liu, Peilin
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2022, 18 (04)
  • [8] Image-text interaction graph neural network for image-text sentiment analysis
    Wenxiong Liao
    Bi Zeng
    Jianqi Liu
    Pengfei Wei
    Jiongkun Fang
    [J]. Applied Intelligence, 2022, 52 : 11184 - 11198
  • [9] Image-text interaction graph neural network for image-text sentiment analysis
    Liao, Wenxiong
    Zeng, Bi
    Liu, Jianqi
    Wei, Pengfei
    Fang, Jiongkun
    [J]. APPLIED INTELLIGENCE, 2022, 52 (10) : 11184 - 11198
  • [10] CGNN: Caption-assisted graph neural network for image-text retrieval
    Hu, Yongli
    Zhang, Hanfu
    Jiang, Huajie
    Bi, Yandong
    Yin, Baocai
    [J]. PATTERN RECOGNITION LETTERS, 2022, 161 : 137 - 142