Prototype local-global alignment network for image-text retrieval

被引:2
|
作者
Meng, Lingtao [1 ]
Zhang, Feifei [1 ]
Zhang, Xi [2 ]
Xu, Changsheng [2 ]
机构
[1] Tianjin Univ Technol, Sch Comp Sci & Engn, Binshui West St, Tianjin 300380, Tianjin, Peoples R China
[2] Chinese Acad Sci, Inst Automat, East Zhongguancun Rd, Beijing 100080, Peoples R China
基金
北京市自然科学基金; 中国国家自然科学基金;
关键词
Image-text retrieval; Local alignment; Global alignment; Prototype;
D O I
10.1007/s13735-022-00258-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image-text retrieval is a challenging task due to the requirement of thorough multimodal understanding and precise inter-modality relationship discovery. However, most previous approaches resort to doing global image-text alignment and neglect fine-grained correspondence. Although some works explore local region-word alignment, they usually suffer from a heavy computing burden. In this paper, we propose a prototype local-global alignment (PLGA) network for image-text retrieval by jointly performing the fine-grained local alignment and high-level global alignment. Specifically, our PLGA contains two key components: a prototype-based local alignment module and a multi-scale global alignment module. The former enables efficient fine-grained local matching by combining region-prototype alignment and word-prototype alignment, and the latter helps perceive hierarchical global semantics by exploring multi-scale global correlations between the image and text. Overall, the local and global alignment modules can boost their performances for each other via the unified model. Quantitative and qualitative experimental results on Flickr30K and MS-COCO benchmarks demonstrate that our proposed approach performs favorably against state-of-the-art methods.
引用
下载
收藏
页码:525 / 538
页数:14
相关论文
共 50 条
  • [41] Semantic Completion and Filtration for Image-Text Retrieval
    Yang, Song
    Li, Qiang
    Li, Wenhui
    Li, Xuan-Ya
    Jin, Ran
    Lv, Bo
    Wang, Rui
    Liu, Anan
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (04)
  • [42] CGNN: Caption-assisted graph neural network for image-text retrieval
    Hu, Yongli
    Zhang, Hanfu
    Jiang, Huajie
    Bi, Yandong
    Yin, Baocai
    PATTERN RECOGNITION LETTERS, 2022, 161 : 137 - 142
  • [43] Flexible graph-based attention and pooling network for image-text retrieval
    Sun, Hao
    Qin, Xiaolin
    Liu, Xiaojing
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (19) : 57895 - 57912
  • [44] Multi-Task Visual Semantic Embedding Network for Image-Text Retrieval
    Qin, Xue-Yang
    Li, Li-Shuang
    Tang, Jing-Yao
    Hao, Fei
    Ge, Mei-Ling
    Pang, Guang-Yao
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2024, 39 (04) : 811 - 826
  • [45] Heterogeneous Graph Fusion Network for cross-modal image-text retrieval
    Qin, Xueyang
    Li, Lishuang
    Pang, Guangyao
    Hao, Fei
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 249
  • [46] Image-text bidirectional learning network based cross-modal retrieval
    Li, Zhuoyi
    Lu, Huibin
    Fu, Hao
    Gu, Guanghua
    NEUROCOMPUTING, 2022, 483 : 148 - 159
  • [47] Multi-layer Probabilistic Association Reasoning Network for Image-Text Retrieval
    Li W.
    Xiong R.
    Fan X.
    IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34 (10) : 1 - 1
  • [48] CMPD: Using Cross Memory Network With Pair Discrimination for Image-Text Retrieval
    Wen, Xin
    Han, Zhizhong
    Liu, Yu-Shen
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (06) : 2427 - 2437
  • [49] Global-Guided Asymmetric Attention Network for Image-Text Matching
    Wu, Dongqing
    Li, Huihui
    Tang, Yinge
    Guo, Lei
    Liu, Hang
    NEUROCOMPUTING, 2022, 481 : 77 - 90
  • [50] Entity Semantic Feature Fusion Network for Remote Sensing Image-Text Retrieval
    Shui, Jianan
    Ding, Shuaipeng
    Li, Mingyong
    Ma, Yan
    WEB AND BIG DATA, APWEB-WAIM 2024, PT V, 2024, 14965 : 130 - 145