Learning hierarchical embedding space for image-text matching

被引：0

作者：

Sun, Hao ^{[1
]}

Qin, Xiaolin ^{[1
]}

Liu, Xiaojing ^{[1
]}

机构：

[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing, Jiangsu, Peoples R China

来源：

INTELLIGENT DATA ANALYSIS | 2024年 / 28卷 / 03期

基金：

中国国家自然科学基金;

关键词：

Information retrieval; cross-modal representation; hierarchical embedding; local alignment;

D O I：

10.3233/IDA-230214

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

There are two mainstream strategies for image-text matching at present. The one, termed as joint embedding learning, aims to model the semantic information of both image and sentence in a shared feature subspace, which facilitates the measurement of semantic similarity but only focuses on global alignment relationship. To explore the local semantic relationship more fully, the other one, termed as metric learning, aims to learn a complex similarity function to directly output score of each image-text pair. However, it significantly suffers from more computation burden at retrieval stage. In this paper, we propose a hierarchically joint embedding model to incorporate the local semantic relationship into a joint embedding learning framework. The proposed method learns the shared local and global embedding spaces simultaneously, and models the joint local embedding space with respect to specific local similarity labels which are easy to access from the lexical information of corpus. Unlike the methods based on metric learning, we can prepare the fixed representations of both images and sentences by concatenating the normalized local and global representations, which makes it feasible to perform the efficient retrieval. And experiments show that the proposed model can achieve competitive performance when compared to the existing joint embedding learning models on two publicly available datasets Flickr30k and MS-COCO.

引用

下载

页码：647 / 665

页数：19

共 50 条

[1] Regularizing Visual Semantic Embedding With Contrastive Learning for Image-Text Matching
Liu, Yang
Liu, Hong
Wang, Huaqiu
Liu, Mengyuan
IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 1332 - 1336
[2] Hierarchical Knowledge-Based Graph Embedding Model for Image-Text Matching in IoTs
Zhang, Lizong
Li, Meng
Yan, Ke
Wang, Ruozhou
Hui, Bei
IEEE INTERNET OF THINGS JOURNAL, 2022, 9 (12) : 9399 - 9409
[3] Adversarial Attentive Multi-Modal Embedding Learning for Image-Text Matching
Wei, Kaimin
Zhou, Zhibo
IEEE ACCESS, 2020, 8 (08): : 96237 - 96248
[4] Location Attention Knowledge Embedding Model for Image-Text Matching
Xu, Guoqing
Hu, Min
Wang, Xiaohua
Yang, Jiaoyun
Li, Nan
Zhang, Qingyu
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT I, 2024, 14425 : 408 - 421
[5] Giving Text More Imagination Space for Image-text Matching
Dong, Xinfeng
Han, Longfei
Zhang, Dingwen
Liu, Li
Han, Junwei
Zhang, Huaxiang
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 6359 - 6368
[6] Modality-Invariant Image-Text Embedding for Image-Sentence Matching
Liu, Ruoyu
Zhao, Yao
Wei, Shikui
Zheng, Liang
Yang, Yi
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2019, 15 (01)
[7] Hierarchical Feature Aggregation Based on Transformer for Image-Text Matching
Dong, Xinfeng
Zhang, Huaxiang
Zhu, Lei
Nie, Liqiang
Liu, Li
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (09) : 6437 - 6447
[8] Learning Multi-view Embedding in Joint Space for Bidirectional Image-Text Retrieval
Ran, Lu
Wang, Wenmin
2017 IEEE VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2017,
[9] Image-Text Embedding with Hierarchical Knowledge for Cross-Modal Retrieval
Seo, Sanghyun
Kim, Juntae
PROCEEDINGS OF 2018 THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE (CSAI 2018) / 2018 THE 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND MULTIMEDIA TECHNOLOGY (ICIMT 2018), 2018, : 350 - 353
[10] CycleMatch: A cycle-consistent embedding network for image-text matching
Liu, Yu
Guo, Yanming
Liu, Li
Bakker, Erwin M.
Lew, Michael S.
PATTERN RECOGNITION, 2019, 93 : 365 - 379

← 1 2 3 4 5 →