Cross Attention Graph Matching Network for Image-Text Retrieval

被引:0
|
作者
Yang, Xiaoyu [1 ]
Xie, Hao [2 ]
Mao, Junyi [1 ]
Wang, Zhiguo [1 ]
Yin, Guangqiang [1 ,2 ,3 ]
机构
[1] Univ Elect Sci & Technol China, Chengdu 611730, Peoples R China
[2] UESTC, Shenzhen Inst Adv Study, Shenzhen 518110, Peoples R China
[3] Univ Elect Sci & Technol China, Kashi Inst Elect & Informat Ind, Kashi 844199, Peoples R China
关键词
Image-Text Retrieval; Cross-Attention; Intra-modal Reasoning; Graph Matching; Cross-model Matching;
D O I
10.1007/978-981-99-9243-0_28
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Image-text retrieval is a basic cross-modal task whose main idea is to learn image-text matching. As graph convolutional networks are widely used in visual semantic tasks, graph structures are used to represent unstructured information such as information on nodes and correlation between nodes. In this paper, we propose an image-text retrieval model (CAGMN) based on cross attention graph matching. We use the significant regions in the image and the words in the text to model the graph nodes. Then, we use the graph convolutional network respectively to deduce the relationships within the modes and extract the relationships within the modes. At the same time, cross-attention feature extraction method is introduced to promote cross-modal flow of matching information between image regions and words, extract features containing cross-modal matching information, and make full use of intra-modal and inter-modal information. Finally, the graph structure matching and image-text global similarity calculation are carried out. At the same time, the graph structure matching information and global similarity information are used to learn the image-text matching relationship at different levels.
引用
收藏
页码:274 / 286
页数:13
相关论文
共 50 条
  • [1] Cross-modal Graph Matching Network for Image-text Retrieval
    Cheng, Yuhao
    Zhu, Xiaoguang
    Qian, Jiuchao
    Wen, Fei
    Liu, Peilin
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2022, 18 (04)
  • [2] Stacked Cross Attention for Image-Text Matching
    Lee, Kuang-Huei
    Chen, Xi
    Hua, Gang
    Hu, Houdong
    He, Xiaodong
    [J]. COMPUTER VISION - ECCV 2018, PT IV, 2018, 11208 : 212 - 228
  • [3] Flexible graph-based attention and pooling network for image-text retrieval
    Sun, Hao
    Qin, Xiaolin
    Liu, Xiaojing
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (19) : 57895 - 57912
  • [4] Transformer Reasoning Network for Image-Text Matching and Retrieval
    Messina, Nicola
    Falchi, Fabrizio
    Esuli, Andrea
    Amato, Giuseppe
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 5222 - 5229
  • [5] Heterogeneous Graph Fusion Network for cross-modal image-text retrieval
    Qin, Xueyang
    Li, Lishuang
    Pang, Guangyao
    Hao, Fei
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 249
  • [6] Position Focused Attention Network for Image-Text Matching
    Wang, Yaxiong
    Yang, Hao
    Qian, Xueming
    Ma, Lin
    Lu, Jing
    Li, Biao
    Fan, Xin
    [J]. PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 3792 - 3798
  • [7] Cross-modal Scene Graph Matching for Relationship-aware Image-Text Retrieval
    Wang, Sijin
    Wang, Ruiping
    Yao, Ziwei
    Shan, Shiguang
    Chen, Xilin
    [J]. 2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 1497 - 1506
  • [8] SMAN: Stacked Multimodal Attention Network for Cross-Modal Image-Text Retrieval
    Ji, Zhong
    Wang, Haoran
    Han, Jungong
    Pang, Yanwei
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (02) : 1086 - 1097
  • [9] HGAN: Hierarchical Graph Alignment Network for Image-Text Retrieval
    Guo, Jie
    Wang, Meiting
    Zhou, Yan
    Song, Bin
    Chi, Yuhao
    Fan, Wei
    Chang, Jianglong
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 9189 - 9202
  • [10] Scene Graph based Fusion Network for Image-Text Retrieval
    Wang, Guoliang
    Shang, Yanlei
    Chen, Yong
    Zhen, Chaoqi
    Cheng, Dequan
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 138 - 143