Prototype local-global alignment network for image-text retrieval

被引:2
|
作者
Meng, Lingtao [1 ]
Zhang, Feifei [1 ]
Zhang, Xi [2 ]
Xu, Changsheng [2 ]
机构
[1] Tianjin Univ Technol, Sch Comp Sci & Engn, Binshui West St, Tianjin 300380, Tianjin, Peoples R China
[2] Chinese Acad Sci, Inst Automat, East Zhongguancun Rd, Beijing 100080, Peoples R China
基金
北京市自然科学基金; 中国国家自然科学基金;
关键词
Image-text retrieval; Local alignment; Global alignment; Prototype;
D O I
10.1007/s13735-022-00258-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image-text retrieval is a challenging task due to the requirement of thorough multimodal understanding and precise inter-modality relationship discovery. However, most previous approaches resort to doing global image-text alignment and neglect fine-grained correspondence. Although some works explore local region-word alignment, they usually suffer from a heavy computing burden. In this paper, we propose a prototype local-global alignment (PLGA) network for image-text retrieval by jointly performing the fine-grained local alignment and high-level global alignment. Specifically, our PLGA contains two key components: a prototype-based local alignment module and a multi-scale global alignment module. The former enables efficient fine-grained local matching by combining region-prototype alignment and word-prototype alignment, and the latter helps perceive hierarchical global semantics by exploring multi-scale global correlations between the image and text. Overall, the local and global alignment modules can boost their performances for each other via the unified model. Quantitative and qualitative experimental results on Flickr30K and MS-COCO benchmarks demonstrate that our proposed approach performs favorably against state-of-the-art methods.
引用
收藏
页码:525 / 538
页数:14
相关论文
共 50 条
  • [31] EENet: embedding enhancement network for compositional image-text retrieval using generated text
    Chan Hur
    Hyeyoung Park
    Multimedia Tools and Applications, 2024, 83 : 49689 - 49705
  • [32] Compositional Learning of Image-Text Query for Image Retrieval
    Anwaar, Muhammad Umer
    Labintcev, Egor
    Kleinsteuber, Martin
    2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, : 1139 - 1148
  • [33] Image-text interaction graph neural network for image-text sentiment analysis
    Wenxiong Liao
    Bi Zeng
    Jianqi Liu
    Pengfei Wei
    Jiongkun Fang
    Applied Intelligence, 2022, 52 : 11184 - 11198
  • [34] EENet: embedding enhancement network for compositional image-text retrieval using generated text
    Hur, Chan
    Park, Hyeyoung
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (16) : 49689 - 49705
  • [35] Image-text interaction graph neural network for image-text sentiment analysis
    Liao, Wenxiong
    Zeng, Bi
    Liu, Jianqi
    Wei, Pengfei
    Fang, Jiongkun
    APPLIED INTELLIGENCE, 2022, 52 (10) : 11184 - 11198
  • [36] Kernel triplet loss for image-text retrieval
    Pan, Zhengxin
    Wu, Fangyu
    Zhang, Bailing
    COMPUTER ANIMATION AND VIRTUAL WORLDS, 2022, 33 (3-4)
  • [37] Reservoir Computing Transformer for Image-Text Retrieval
    Li, Wenrui
    Ma, Zhengyu
    Deng, Liang-Jian
    Wang, Penghong
    Shi, Jinqiao
    Fan, Xiaopeng
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5605 - 5613
  • [38] Dynamic Contrastive Distillation for Image-Text Retrieval
    Rao, Jun
    Ding, Liang
    Qi, Shuhan
    Fang, Meng
    Liu, Yang
    Shen, Li
    Tao, Dacheng
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 8383 - 8395
  • [39] Multi-level Symmetric Semantic Alignment Network for image-text matching
    Wang, Wenzhuang
    Di, Xiaoguang
    Liu, Maozhen
    Gao, Feng
    NEUROCOMPUTING, 2024, 599
  • [40] HIERARCHICAL ATTENTION IMAGE-TEXT ALIGNMENT NETWORK FOR PERSON RE-IDENTIFICATION
    Kansal, Kajal
    Subramanyam, A., V
    Wang, Zheng
    Satoh, Shinichi
    2021 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO WORKSHOPS (ICMEW), 2021,