Fusion layer attention for image-text matching

被引:4
|
作者
Wang, Depeng [1 ]
Wang, Liejun [2 ]
Song, Shiji [3 ]
Huang, Gao [3 ]
Guo, Yuchen [3 ]
Cheng, Shuli [2 ]
Ao, Naixiang [4 ]
Du, Anyu [2 ]
机构
[1] XinJiang Univ, Software Coll, Urumqi 830046, Peoples R China
[2] Xinjiang Univ, Informat Sci & Engn Coll, Urumqi 830046, Peoples R China
[3] Tsinghua Univ, Beijing 100084, Peoples R China
[4] China Acad Elect & Informat Technol, Xinjiang Lianhai INA INT Informat Technol Ltd, Urumqi 830000, Peoples R China
基金
美国国家科学基金会;
关键词
Deep learning; Image-text matching; Multimodal; Retrieval;
D O I
10.1016/j.neucom.2021.01.124
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image-text matching aims to find the relationship between image and text data and to establish a connection between them. The main challenge of image-text matching is the fact that images and texts have different data distributions and feature representations. Current methods for image-text matching fall into two basic types: methods that map image and text data into a common space and then use distance measurements and methods that treat image-text matching as a classification problem. In both cases, the two data modes used are image and text data. In our method, we create a fusion layer to extract intermediate modes, thus improving the image-text processing results. We also propose a concise way to update the loss function that makes it easier for neural networks to handle difficult problems. The proposed method was verified on the Flickr30K and MS-COCO datasets and achieved superior matching results compared to existing methods. (c) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页码:249 / 259
页数:11
相关论文
共 50 条
  • [1] Stacked Cross Attention for Image-Text Matching
    Lee, Kuang-Huei
    Chen, Xi
    Hua, Gang
    Hu, Houdong
    He, Xiaodong
    [J]. COMPUTER VISION - ECCV 2018, PT IV, 2018, 11208 : 212 - 228
  • [2] Position Focused Attention Network for Image-Text Matching
    Wang, Yaxiong
    Yang, Hao
    Qian, Xueming
    Ma, Lin
    Lu, Jing
    Li, Biao
    Fan, Xin
    [J]. PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 3792 - 3798
  • [3] Dual Semantic Relationship Attention Network for Image-Text Matching
    Wen, Keyu
    Gu, Xiaodong
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [4] Location Attention Knowledge Embedding Model for Image-Text Matching
    Xu, Guoqing
    Hu, Min
    Wang, Xiaohua
    Yang, Jiaoyun
    Li, Nan
    Zhang, Qingyu
    [J]. PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT I, 2024, 14425 : 408 - 421
  • [5] Rare-aware attention network for image-text matching
    Wang, Yan
    Su, Yuting
    Li, Wenhui
    Sun, Zhengya
    Wei, Zhiqiang
    Nie, Jie
    Li, Xuanya
    Liu, An-An
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (03)
  • [6] Negative-Aware Attention Framework for Image-Text Matching
    Zhang, Kun
    Mao, Zhendong
    Wang, Quan
    Zhang, Yongdong
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 15640 - 15649
  • [7] Cross Attention Graph Matching Network for Image-Text Retrieval
    Yang, Xiaoyu
    Xie, Hao
    Mao, Junyi
    Wang, Zhiguo
    Yin, Guangqiang
    [J]. PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND NETWORKS, VOL II, CENET 2023, 2024, 1126 : 274 - 286
  • [8] Focus Your Attention: A Bidirectional Focal Attention Network for Image-Text Matching
    Liu, Chunxiao
    Mao, Zhendong
    Liu, An-An
    Zhang, Tianzhu
    Wang, Bin
    Zhang, Yongdong
    [J]. PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 3 - 11
  • [9] Bi-Attention enhanced representation learning for image-text matching
    Tian, Yumin
    Ding, Aqiang
    Wang, Di
    Luo, Xuemei
    Wan, Bo
    Wang, Yifeng
    [J]. PATTERN RECOGNITION, 2023, 140
  • [10] Unified Adaptive Relevance Distinguishable Attention Network for Image-Text Matching
    Zhang, Kun
    Mao, Zhendong
    Liu, An-An
    Zhang, Yongdong
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 1320 - 1332