A Multiview Text Imagination Network Based on Latent Alignment for Image-Text Matching

被引：2

作者：

Shang, Heng ^{[1
]}

Zhao, Guoshuai ^{[1
]}

Shi, Jing ^{[1
]}

Qian, Xueming ^{[2
]}

机构：

[1] Xi An Jiao Tong Univ, Sch Software Engn, Xian 710049, Peoples R China

[2] Xi An Jiao Tong Univ, SMILES Lab, Xian 710049, Peoples R China

来源：

IEEE INTELLIGENT SYSTEMS | 2023年 / 38卷 / 03期

基金：

中国国家自然科学基金; 中国博士后科学基金;

关键词：

Feature extraction; Semantics; Text mining; Intelligent systems; Image representation; Task analysis; Image edge detection;

D O I：

10.1109/MIS.2023.3265176

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In image-text matching fields, one of the keys to improving performance is to extract features with more semantic information. Existing works demonstrate that semantic enrichment through knowledge expansion can improve performance. Most of them expand image features, however, the shortage of semantic information in text modality and the unilateral character of the view are often bottlenecks that limit the performance of image-text matching models. To solve the two problems, we aggregate knowledge from multiple views and propose a word imagination graph (WIG). A WIG can be used to expand textual semantic information by imagination based on input images. Then, utilizing WIG, we construct a novel multiview text imagination network (MTIN). A MTIN enables latent alignment of images and texts on tags, which can assist matching on a semantic level. Results from the Flickr30K and MS-COCO datasets demonstrate the effectiveness of our method. The source code has been released on GitHub https://github.com/smileslabsh/Multiview-Text-Imagination-Network.

引用

页码：41 / 50

页数：10

共 50 条

[31] Hierarchical Feature Aggregation Based on Transformer for Image-Text Matching
Dong, Xinfeng
Zhang, Huaxiang
Zhu, Lei
Nie, Liqiang
Liu, Li
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (09) : 6437 - 6447
[32] Asymmetric Polysemous Reasoning for Image-Text Matching
Zhang, Hongping
Yang, Ming
2023 23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW 2023, 2023, : 1013 - 1022
[33] Cross-modal Semantically Augmented Network for Image-text Matching
Yao, Tao
Li, Yiru
Li, Ying
Zhu, Yingying
Wang, Gang
Yue, Jun
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (04)
[34] Global-Guided Asymmetric Attention Network for Image-Text Matching
Wu, Dongqing
Li, Huihui
Tang, Yinge
Guo, Lei
Liu, Hang
NEUROCOMPUTING, 2022, 481 : 77 - 90
[35] Multi-scale motivated neural network for image-text matching
Xueyang Qin
Lishuang Li
Guangyao Pang
Multimedia Tools and Applications, 2024, 83 : 4383 - 4407
[36] Fusion layer attention for image-text matching
Wang, Depeng
Wang, Liejun
Song, Shiji
Huang, Gao
Guo, Yuchen
Cheng, Shuli
Ao, Naixiang
Du, Anyu
NEUROCOMPUTING, 2021, 442 : 249 - 259
[37] Cross-modal Graph Matching Network for Image-text Retrieval
Cheng, Yuhao
Zhu, Xiaoguang
Qian, Jiuchao
Wen, Fei
Liu, Peilin
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2022, 18 (04)
[38] Visual Semantic Reasoning for Image-Text Matching
Li, Kunpeng
Zhang, Yulun
Li, Kai
Li, Yuanyuan
Fu, Yun
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4653 - 4661
[39] Global-Guided Asymmetric Attention Network for Image-Text Matching
Wu, Dongqing
Li, Huihui
Tang, Yinge
Guo, Lei
Liu, Hang
Neurocomputing, 2022, 481 : 77 - 90
[40] CycleMatch: A cycle-consistent embedding network for image-text matching
Liu, Yu
Guo, Yanming
Liu, Li
Bakker, Erwin M.
Lew, Michael S.
PATTERN RECOGNITION, 2019, 93 : 365 - 379

← 1 2 3 4 5 →