Rare-aware attention network for image-text matching

被引：7

作者：

Wang, Yan ^{[1
,2
]}

Su, Yuting ^{[1
]}

Li, Wenhui ^{[1
]}

Sun, Zhengya ^{[3
]}

Wei, Zhiqiang ^{[4
]}

Nie, Jie ^{[4
]}

Li, Xuanya ^{[5
]}

Liu, An-An ^{[1
,2
]}

机构：

[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China

[2] Hefei Comprehens Natl Sci Ctr, Inst Artificial Intelligence, Hefei 230088, Peoples R China

[3] Chinese Acad Sci, Inst Automat, Beijing, Peoples R China

[4] Ocean Univ China, Qingdao, Shandong, Peoples R China

[5] Baidu Inc, Beijing, Peoples R China

来源：

INFORMATION PROCESSING & MANAGEMENT | 2023年 / 60卷 / 03期

基金：

中国国家自然科学基金; 中国博士后科学基金;

关键词：

Cross modal retrieval; Attention mechanism; Semantic alignment;

D O I：

10.1016/j.ipm.2023.103280

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Image and text matching bridges visual and textual modality differences and plays a con-siderable role in cross-modal retrieval. Much progress has been achieved through semantic representation and alignment. However, the distribution of multimedia data is severely un-balanced and contains many low-frequency occurrences, which are often ignored and cause performance degradation, i.e., the long-tail effect. In this work, we propose a novel rare-aware attention network (RAAN), which explores and exploits textual rare content for tackling the long-tail effect of image and text matching. Specifically, we first design a rare-aware mining module, which contains global prior information construction and rare fragment detector for modeling the characteristic of rare content. Then, the rare attention matching utilizes prior information as attention to guide the representation enhancement of rare content and introduces the rareness representation to strengthen the similarity calculation. Finally, we design prior information loss to optimize the model together with the triplet loss. We perform quantitative and qualitative experiments on two large-scale databases and achieve leading performance. In particular, we conduct 0-shot test for rare content and improve rSum by 21.0 and 41.5 on Flickr30K (155,000 image and text pairs) and MSCOCO (616,435 image and text pairs), demonstrating the effectiveness of the proposed method for the long-tail effect.

引用

页数：15

共 50 条

[31] Region Reinforcement Network With Topic Constraint for Image-Text Matching
Wu, Jie
Wu, Chunlei
Lu, Jing
Wang, Leiquan
Cui, Xuerong
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (01) : 388 - 397
[32] A Multiview Text Imagination Network Based on Latent Alignment for Image-Text Matching
Shang, Heng
Zhao, Guoshuai
Shi, Jing
Qian, Xueming
[J]. IEEE INTELLIGENT SYSTEMS, 2023, 38 (03) : 41 - 50
[33] Multi-scale motivated neural network for image-text matching
Qin, Xueyang
Li, Lishuang
Pang, Guangyao
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (2) : 4383 - 4407
[34] Dual-View Semantic Inference Network for image-text matching
Wu, Chunlei
Wu, Jie
Cao, Haiwen
Wei, Yiwei
Wang, Leiquan
[J]. NEUROCOMPUTING, 2021, 426 : 47 - 57
[35] Cross-modal Semantically Augmented Network for Image-text Matching
Yao, Tao
Li, Yiru
Li, Ying
Zhu, Yingying
Wang, Gang
Yue, Jun
[J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (04)
[36] Multi-scale motivated neural network for image-text matching
Xueyang Qin
Lishuang Li
Guangyao Pang
[J]. Multimedia Tools and Applications, 2024, 83 : 4383 - 4407
[37] Cross-modal Graph Matching Network for Image-text Retrieval
Cheng, Yuhao
Zhu, Xiaoguang
Qian, Jiuchao
Wen, Fei
Liu, Peilin
[J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2022, 18 (04)
[38] Local Alignment with Global Semantic Consistence Network for Image-Text Matching
Li, Pengwei
Wu, Shihua
Lian, Zhichao
[J]. 2022 IEEE INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, INTL CONF ON CLOUD AND BIG DATA COMPUTING, INTL CONF ON CYBER SCIENCE AND TECHNOLOGY CONGRESS (DASC/PICOM/CBDCOM/CYBERSCITECH), 2022, : 652 - 657
[39] Step-Wise Hierarchical Alignment Network for Image-Text Matching
Ji, Zhong
Chen, Kexin
Wang, Haoran
[J]. PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 765 - 771
[40] CycleMatch: A cycle-consistent embedding network for image-text matching
Liu, Yu
Guo, Yanming
Liu, Li
Bakker, Erwin M.
Lew, Michael S.
[J]. PATTERN RECOGNITION, 2019, 93 : 365 - 379

← 1 2 3 4 5 →