A Multiview Text Imagination Network Based on Latent Alignment for Image-Text Matching

被引:2
|
作者
Shang, Heng [1 ]
Zhao, Guoshuai [1 ]
Shi, Jing [1 ]
Qian, Xueming [2 ]
机构
[1] Xi An Jiao Tong Univ, Sch Software Engn, Xian 710049, Peoples R China
[2] Xi An Jiao Tong Univ, SMILES Lab, Xian 710049, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
Feature extraction; Semantics; Text mining; Intelligent systems; Image representation; Task analysis; Image edge detection;
D O I
10.1109/MIS.2023.3265176
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In image-text matching fields, one of the keys to improving performance is to extract features with more semantic information. Existing works demonstrate that semantic enrichment through knowledge expansion can improve performance. Most of them expand image features, however, the shortage of semantic information in text modality and the unilateral character of the view are often bottlenecks that limit the performance of image-text matching models. To solve the two problems, we aggregate knowledge from multiple views and propose a word imagination graph (WIG). A WIG can be used to expand textual semantic information by imagination based on input images. Then, utilizing WIG, we construct a novel multiview text imagination network (MTIN). A MTIN enables latent alignment of images and texts on tags, which can assist matching on a semantic level. Results from the Flickr30K and MS-COCO datasets demonstrate the effectiveness of our method. The source code has been released on GitHub https://github.com/smileslabsh/Multiview-Text-Imagination-Network.
引用
收藏
页码:41 / 50
页数:10
相关论文
共 50 条
  • [21] Region Reinforcement Network With Topic Constraint for Image-Text Matching
    Wu, Jie
    Wu, Chunlei
    Lu, Jing
    Wang, Leiquan
    Cui, Xuerong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (01) : 388 - 397
  • [22] Multiview adaptive attention pooling for image-text retrieval
    Ding, Yunlai
    Yu, Jiaao
    Lv, Qingxuan
    Zhao, Haoran
    Dong, Junyu
    Li, Yuezun
    KNOWLEDGE-BASED SYSTEMS, 2024, 291
  • [23] Cross Attention Graph Matching Network for Image-Text Retrieval
    Yang, Xiaoyu
    Xie, Hao
    Mao, Junyi
    Wang, Zhiguo
    Yin, Guangqiang
    PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND NETWORKS, VOL II, CENET 2023, 2024, 1126 : 274 - 286
  • [24] Improving Image-Text Matching With Bidirectional Consistency of Cross-Modal Alignment
    Li, Zhe
    Zhang, Lei
    Zhang, Kun
    Zhang, Yongdong
    Mao, Zhendong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (07) : 6590 - 6607
  • [25] Prototype local-global alignment network for image-text retrieval
    Meng, Lingtao
    Zhang, Feifei
    Zhang, Xi
    Xu, Changsheng
    INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2022, 11 (04) : 525 - 538
  • [26] Unified Adaptive Relevance Distinguishable Attention Network for Image-Text Matching
    Zhang, Kun
    Mao, Zhendong
    Liu, An-An
    Zhang, Yongdong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 1320 - 1332
  • [27] HAAN: Learning a Hierarchical Adaptive Alignment Network for Image-Text Retrieval
    Wang, Shuhuai
    Liu, Zheng
    Pei, Xinlei
    Xu, Junhao
    SENSORS, 2023, 23 (05)
  • [28] Similarity Reasoning and Filtration for Image-Text Matching
    Diao, Haiwen
    Zhang, Ying
    Ma, Lin
    Lu, Huchuan
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 1218 - 1226
  • [29] Multi-scale motivated neural network for image-text matching
    Qin, Xueyang
    Li, Lishuang
    Pang, Guangyao
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (2) : 4383 - 4407
  • [30] Dual-View Semantic Inference Network for image-text matching
    Wu, Chunlei
    Wu, Jie
    Cao, Haiwen
    Wei, Yiwei
    Wang, Leiquan
    NEUROCOMPUTING, 2021, 426 : 47 - 57