Learning hierarchical embedding space for image-text matching

被引:0
|
作者
Sun, Hao [1 ]
Qin, Xiaolin [1 ]
Liu, Xiaojing [1 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
Information retrieval; cross-modal representation; hierarchical embedding; local alignment;
D O I
10.3233/IDA-230214
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
There are two mainstream strategies for image-text matching at present. The one, termed as joint embedding learning, aims to model the semantic information of both image and sentence in a shared feature subspace, which facilitates the measurement of semantic similarity but only focuses on global alignment relationship. To explore the local semantic relationship more fully, the other one, termed as metric learning, aims to learn a complex similarity function to directly output score of each image-text pair. However, it significantly suffers from more computation burden at retrieval stage. In this paper, we propose a hierarchically joint embedding model to incorporate the local semantic relationship into a joint embedding learning framework. The proposed method learns the shared local and global embedding spaces simultaneously, and models the joint local embedding space with respect to specific local similarity labels which are easy to access from the lexical information of corpus. Unlike the methods based on metric learning, we can prepare the fixed representations of both images and sentences by concatenating the normalized local and global representations, which makes it feasible to perform the efficient retrieval. And experiments show that the proposed model can achieve competitive performance when compared to the existing joint embedding learning models on two publicly available datasets Flickr30k and MS-COCO.
引用
下载
收藏
页码:647 / 665
页数:19
相关论文
共 50 条
  • [1] Regularizing Visual Semantic Embedding With Contrastive Learning for Image-Text Matching
    Liu, Yang
    Liu, Hong
    Wang, Huaqiu
    Liu, Mengyuan
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 1332 - 1336
  • [2] Hierarchical Knowledge-Based Graph Embedding Model for Image-Text Matching in IoTs
    Zhang, Lizong
    Li, Meng
    Yan, Ke
    Wang, Ruozhou
    Hui, Bei
    IEEE INTERNET OF THINGS JOURNAL, 2022, 9 (12) : 9399 - 9409
  • [3] Adversarial Attentive Multi-Modal Embedding Learning for Image-Text Matching
    Wei, Kaimin
    Zhou, Zhibo
    IEEE ACCESS, 2020, 8 (08): : 96237 - 96248
  • [4] Location Attention Knowledge Embedding Model for Image-Text Matching
    Xu, Guoqing
    Hu, Min
    Wang, Xiaohua
    Yang, Jiaoyun
    Li, Nan
    Zhang, Qingyu
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT I, 2024, 14425 : 408 - 421
  • [5] Giving Text More Imagination Space for Image-text Matching
    Dong, Xinfeng
    Han, Longfei
    Zhang, Dingwen
    Liu, Li
    Han, Junwei
    Zhang, Huaxiang
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 6359 - 6368
  • [6] Modality-Invariant Image-Text Embedding for Image-Sentence Matching
    Liu, Ruoyu
    Zhao, Yao
    Wei, Shikui
    Zheng, Liang
    Yang, Yi
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2019, 15 (01)
  • [7] Hierarchical Feature Aggregation Based on Transformer for Image-Text Matching
    Dong, Xinfeng
    Zhang, Huaxiang
    Zhu, Lei
    Nie, Liqiang
    Liu, Li
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (09) : 6437 - 6447
  • [8] Learning Multi-view Embedding in Joint Space for Bidirectional Image-Text Retrieval
    Ran, Lu
    Wang, Wenmin
    2017 IEEE VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2017,
  • [9] Image-Text Embedding with Hierarchical Knowledge for Cross-Modal Retrieval
    Seo, Sanghyun
    Kim, Juntae
    PROCEEDINGS OF 2018 THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE (CSAI 2018) / 2018 THE 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND MULTIMEDIA TECHNOLOGY (ICIMT 2018), 2018, : 350 - 353
  • [10] CycleMatch: A cycle-consistent embedding network for image-text matching
    Liu, Yu
    Guo, Yanming
    Liu, Li
    Bakker, Erwin M.
    Lew, Michael S.
    PATTERN RECOGNITION, 2019, 93 : 365 - 379