Fusion layer attention for image-text matching

被引：4

作者：

Wang, Depeng ^{[1
]}

Wang, Liejun ^{[2
]}

Song, Shiji ^{[3
]}

Huang, Gao ^{[3
]}

Guo, Yuchen ^{[3
]}

Cheng, Shuli ^{[2
]}

Ao, Naixiang ^{[4
]}

Du, Anyu ^{[2
]}

机构：

[1] XinJiang Univ, Software Coll, Urumqi 830046, Peoples R China

[2] Xinjiang Univ, Informat Sci & Engn Coll, Urumqi 830046, Peoples R China

[3] Tsinghua Univ, Beijing 100084, Peoples R China

[4] China Acad Elect & Informat Technol, Xinjiang Lianhai INA INT Informat Technol Ltd, Urumqi 830000, Peoples R China

来源：

NEUROCOMPUTING | 2021年 / 442卷

基金：

美国国家科学基金会;

关键词：

Deep learning; Image-text matching; Multimodal; Retrieval;

D O I：

10.1016/j.neucom.2021.01.124

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Image-text matching aims to find the relationship between image and text data and to establish a connection between them. The main challenge of image-text matching is the fact that images and texts have different data distributions and feature representations. Current methods for image-text matching fall into two basic types: methods that map image and text data into a common space and then use distance measurements and methods that treat image-text matching as a classification problem. In both cases, the two data modes used are image and text data. In our method, we create a fusion layer to extract intermediate modes, thus improving the image-text processing results. We also propose a concise way to update the loss function that makes it easier for neural networks to handle difficult problems. The proposed method was verified on the Flickr30K and MS-COCO datasets and achieved superior matching results compared to existing methods. (c) 2021 Elsevier B.V. All rights reserved.

引用

页码：249 / 259

页数：11

共 50 条

[1] Stacked Cross Attention for Image-Text Matching
Lee, Kuang-Huei
Chen, Xi
Hua, Gang
Hu, Houdong
He, Xiaodong
[J]. COMPUTER VISION - ECCV 2018, PT IV, 2018, 11208 : 212 - 228
[2] Position Focused Attention Network for Image-Text Matching
Wang, Yaxiong
Yang, Hao
Qian, Xueming
Ma, Lin
Lu, Jing
Li, Biao
Fan, Xin
[J]. PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 3792 - 3798
[3] Dual Semantic Relationship Attention Network for Image-Text Matching
Wen, Keyu
Gu, Xiaodong
[J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
[4] Location Attention Knowledge Embedding Model for Image-Text Matching
Xu, Guoqing
Hu, Min
Wang, Xiaohua
Yang, Jiaoyun
Li, Nan
Zhang, Qingyu
[J]. PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT I, 2024, 14425 : 408 - 421
[5] Rare-aware attention network for image-text matching
Wang, Yan
Su, Yuting
Li, Wenhui
Sun, Zhengya
Wei, Zhiqiang
Nie, Jie
Li, Xuanya
Liu, An-An
[J]. INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (03)
[6] Negative-Aware Attention Framework for Image-Text Matching
Zhang, Kun
Mao, Zhendong
Wang, Quan
Zhang, Yongdong
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 15640 - 15649
[7] Cross Attention Graph Matching Network for Image-Text Retrieval
Yang, Xiaoyu
Xie, Hao
Mao, Junyi
Wang, Zhiguo
Yin, Guangqiang
[J]. PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND NETWORKS, VOL II, CENET 2023, 2024, 1126 : 274 - 286
[8] Focus Your Attention: A Bidirectional Focal Attention Network for Image-Text Matching
Liu, Chunxiao
Mao, Zhendong
Liu, An-An
Zhang, Tianzhu
Wang, Bin
Zhang, Yongdong
[J]. PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 3 - 11
[9] Bi-Attention enhanced representation learning for image-text matching
Tian, Yumin
Ding, Aqiang
Wang, Di
Luo, Xuemei
Wan, Bo
Wang, Yifeng
[J]. PATTERN RECOGNITION, 2023, 140
[10] Unified Adaptive Relevance Distinguishable Attention Network for Image-Text Matching
Zhang, Kun
Mao, Zhendong
Liu, An-An
Zhang, Yongdong
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 1320 - 1332

← 1 2 3 4 5 →