Rare-aware attention network for image-text matching

被引：7

作者：

Wang, Yan ^{[1
,2
]}

Su, Yuting ^{[1
]}

Li, Wenhui ^{[1
]}

Sun, Zhengya ^{[3
]}

Wei, Zhiqiang ^{[4
]}

Nie, Jie ^{[4
]}

Li, Xuanya ^{[5
]}

Liu, An-An ^{[1
,2
]}

机构：

[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China

[2] Hefei Comprehens Natl Sci Ctr, Inst Artificial Intelligence, Hefei 230088, Peoples R China

[3] Chinese Acad Sci, Inst Automat, Beijing, Peoples R China

[4] Ocean Univ China, Qingdao, Shandong, Peoples R China

[5] Baidu Inc, Beijing, Peoples R China

来源：

INFORMATION PROCESSING & MANAGEMENT | 2023年 / 60卷 / 03期

基金：

中国博士后科学基金; 中国国家自然科学基金;

关键词：

Cross modal retrieval; Attention mechanism; Semantic alignment;

D O I：

10.1016/j.ipm.2023.103280

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Image and text matching bridges visual and textual modality differences and plays a con-siderable role in cross-modal retrieval. Much progress has been achieved through semantic representation and alignment. However, the distribution of multimedia data is severely un-balanced and contains many low-frequency occurrences, which are often ignored and cause performance degradation, i.e., the long-tail effect. In this work, we propose a novel rare-aware attention network (RAAN), which explores and exploits textual rare content for tackling the long-tail effect of image and text matching. Specifically, we first design a rare-aware mining module, which contains global prior information construction and rare fragment detector for modeling the characteristic of rare content. Then, the rare attention matching utilizes prior information as attention to guide the representation enhancement of rare content and introduces the rareness representation to strengthen the similarity calculation. Finally, we design prior information loss to optimize the model together with the triplet loss. We perform quantitative and qualitative experiments on two large-scale databases and achieve leading performance. In particular, we conduct 0-shot test for rare content and improve rSum by 21.0 and 41.5 on Flickr30K (155,000 image and text pairs) and MSCOCO (616,435 image and text pairs), demonstrating the effectiveness of the proposed method for the long-tail effect.

引用

页数：15

共 50 条

[1] Dual Relation-Aware Synergistic Attention Network for Image-Text Matching
Qi, Shanshan
Yang, Luxi
Li, Chunguo
Huang, Yongming
[J]. 2022 11TH INTERNATIONAL CONFERENCE ON COMMUNICATIONS, CIRCUITS AND SYSTEMS (ICCCAS 2022), 2022, : 251 - 256
[2] Negative-Aware Attention Framework for Image-Text Matching
Zhang, Kun
Mao, Zhendong
Wang, Quan
Zhang, Yongdong
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 15640 - 15649
[3] Position Focused Attention Network for Image-Text Matching
Wang, Yaxiong
Yang, Hao
Qian, Xueming
Ma, Lin
Lu, Jing
Li, Biao
Fan, Xin
[J]. PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 3792 - 3798
[4] Reference-Aware Adaptive Network for Image-Text Matching
Xiong G.
Meng M.
Zhang T.
Zhang D.
Zhang Y.
[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34 (10) : 1 - 1
[5] Dual Semantic Relationship Attention Network for Image-Text Matching
Wen, Keyu
Gu, Xiaodong
[J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
[6] Context-Aware Attention Network for Image-Text Retrieval
Zhang, Qi
Lei, Zhen
Zhang, Zhaoxiang
Li, Stan Z.
[J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 3533 - 3542
[7] Cross Attention Graph Matching Network for Image-Text Retrieval
Yang, Xiaoyu
Xie, Hao
Mao, Junyi
Wang, Zhiguo
Yin, Guangqiang
[J]. PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND NETWORKS, VOL II, CENET 2023, 2024, 1126 : 274 - 286
[8] Focus Your Attention: A Bidirectional Focal Attention Network for Image-Text Matching
Liu, Chunxiao
Mao, Zhendong
Liu, An-An
Zhang, Tianzhu
Wang, Bin
Zhang, Yongdong
[J]. PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 3 - 11
[9] Unified Adaptive Relevance Distinguishable Attention Network for Image-Text Matching
Zhang, Kun
Mao, Zhendong
Liu, An-An
Zhang, Yongdong
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 1320 - 1332
[10] Global Relation-Aware Attention Network for Image-Text Retrieval
Cao, Jie
Qian, Shengsheng
Zhang, Huaiwen
Fang, Quan
Xu, Changsheng
[J]. PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, : 19 - 28

← 1 2 3 4 5 →