Prototype local-global alignment network for image-text retrieval

被引:2
|
作者
Meng, Lingtao [1 ]
Zhang, Feifei [1 ]
Zhang, Xi [2 ]
Xu, Changsheng [2 ]
机构
[1] Tianjin Univ Technol, Sch Comp Sci & Engn, Binshui West St, Tianjin 300380, Tianjin, Peoples R China
[2] Chinese Acad Sci, Inst Automat, East Zhongguancun Rd, Beijing 100080, Peoples R China
基金
中国国家自然科学基金; 北京市自然科学基金;
关键词
Image-text retrieval; Local alignment; Global alignment; Prototype;
D O I
10.1007/s13735-022-00258-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image-text retrieval is a challenging task due to the requirement of thorough multimodal understanding and precise inter-modality relationship discovery. However, most previous approaches resort to doing global image-text alignment and neglect fine-grained correspondence. Although some works explore local region-word alignment, they usually suffer from a heavy computing burden. In this paper, we propose a prototype local-global alignment (PLGA) network for image-text retrieval by jointly performing the fine-grained local alignment and high-level global alignment. Specifically, our PLGA contains two key components: a prototype-based local alignment module and a multi-scale global alignment module. The former enables efficient fine-grained local matching by combining region-prototype alignment and word-prototype alignment, and the latter helps perceive hierarchical global semantics by exploring multi-scale global correlations between the image and text. Overall, the local and global alignment modules can boost their performances for each other via the unified model. Quantitative and qualitative experimental results on Flickr30K and MS-COCO benchmarks demonstrate that our proposed approach performs favorably against state-of-the-art methods.
引用
收藏
页码:525 / 538
页数:14
相关论文
共 50 条
  • [1] Prototype local–global alignment network for image–text retrieval
    Lingtao Meng
    Feifei Zhang
    Xi Zhang
    Changsheng Xu
    [J]. International Journal of Multimedia Information Retrieval, 2022, 11 : 525 - 538
  • [2] Local Alignment with Global Semantic Consistence Network for Image-Text Matching
    Li, Pengwei
    Wu, Shihua
    Lian, Zhichao
    [J]. 2022 IEEE INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, INTL CONF ON CLOUD AND BIG DATA COMPUTING, INTL CONF ON CYBER SCIENCE AND TECHNOLOGY CONGRESS (DASC/PICOM/CBDCOM/CYBERSCITECH), 2022, : 652 - 657
  • [3] Mutil-level Local Alignment and Semantic Matching Network for Image-Text Retrieval
    Jiang, Zhukai
    Lian, Zhichao
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT III, 2022, 13531 : 212 - 224
  • [4] HGAN: Hierarchical Graph Alignment Network for Image-Text Retrieval
    Guo, Jie
    Wang, Meiting
    Zhou, Yan
    Song, Bin
    Chi, Yuhao
    Fan, Wei
    Chang, Jianglong
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 9189 - 9202
  • [5] HAAN: Learning a Hierarchical Adaptive Alignment Network for Image-Text Retrieval
    Wang, Shuhuai
    Liu, Zheng
    Pei, Xinlei
    Xu, Junhao
    [J]. SENSORS, 2023, 23 (05)
  • [6] Global Relation-Aware Attention Network for Image-Text Retrieval
    Cao, Jie
    Qian, Shengsheng
    Zhang, Huaiwen
    Fang, Quan
    Xu, Changsheng
    [J]. PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, : 19 - 28
  • [7] A Deep Local and Global Scene-Graph Matching for Image-Text Retrieval
    Manh-Duy Nguyen
    Binh T Nguyen
    Cathal Gurrin
    [J]. NEW TRENDS IN INTELLIGENT SOFTWARE METHODOLOGIES, TOOLS AND TECHNIQUES, 2021, 337 : 510 - 523
  • [8] RELATION-GUIDED NETWORK FOR IMAGE-TEXT RETRIEVAL
    Yang, Yulou
    Shen, Hao
    Yang, Ming
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1856 - 1860
  • [9] Transformer Reasoning Network for Image-Text Matching and Retrieval
    Messina, Nicola
    Falchi, Fabrizio
    Esuli, Andrea
    Amato, Giuseppe
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 5222 - 5229
  • [10] Cross-Media Image-Text Retrieval Combined with Global Similarity and Local Similarity
    Li, Zhixin
    Ling, Feng
    Zhang, Canlong
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA 2019), 2019, : 145 - 153