Learning hierarchical embedding space for image-text matching

被引:0
|
作者
Sun, Hao [1 ]
Qin, Xiaolin [1 ]
Liu, Xiaojing [1 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
Information retrieval; cross-modal representation; hierarchical embedding; local alignment;
D O I
10.3233/IDA-230214
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
There are two mainstream strategies for image-text matching at present. The one, termed as joint embedding learning, aims to model the semantic information of both image and sentence in a shared feature subspace, which facilitates the measurement of semantic similarity but only focuses on global alignment relationship. To explore the local semantic relationship more fully, the other one, termed as metric learning, aims to learn a complex similarity function to directly output score of each image-text pair. However, it significantly suffers from more computation burden at retrieval stage. In this paper, we propose a hierarchically joint embedding model to incorporate the local semantic relationship into a joint embedding learning framework. The proposed method learns the shared local and global embedding spaces simultaneously, and models the joint local embedding space with respect to specific local similarity labels which are easy to access from the lexical information of corpus. Unlike the methods based on metric learning, we can prepare the fixed representations of both images and sentences by concatenating the normalized local and global representations, which makes it feasible to perform the efficient retrieval. And experiments show that the proposed model can achieve competitive performance when compared to the existing joint embedding learning models on two publicly available datasets Flickr30k and MS-COCO.
引用
收藏
页码:647 / 665
页数:19
相关论文
共 50 条
  • [41] Hashing based Efficient Inference for Image-Text Matching
    Tu, Rong-Cheng
    Ji, Lei
    Luo, Huaishao
    Shi, Botian
    Huang, Heyan
    Duan, Nan
    Mao, Xian-Ling
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 743 - 752
  • [42] Towards Deconfounded Image-Text Matching with Causal Inference
    Li, Wenhui
    Su, Xinqi
    Song, Dan
    Wang, Lanjun
    Zhang, Kun
    Liu, An-An
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 6264 - 6273
  • [43] Generating counterfactual negative samples for image-text matching
    Su, Xinqi
    Song, Dan
    Li, Wenhui
    Ren, Tongwei
    Liu, An-An
    INFORMATION PROCESSING & MANAGEMENT, 2025, 62 (03)
  • [44] BIT: Improving Image-text Sentiment Analysis via Learning Bidirectional Image-text Interaction
    Xiao, Xingwang
    Pu, Yuanyuan
    Zhao, Zhengpeng
    Gu, Jinjing
    Xu, Dan
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [45] A NEIGHBOR-AWARE APPROACH FOR IMAGE-TEXT MATCHING
    Liu, Chunxiao
    Mao, Zhendong
    Zang, Wenyu
    Wang, Bin
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 3970 - 3974
  • [46] Similarity Contrastive Capsule Transformation for Image-Text Matching
    Zhang, Bin
    Sun, Ximin
    Li, Xiaoming
    Wang, Shuai
    Liu, Dan
    Jia, Jiangkai
    2023 9TH INTERNATIONAL CONFERENCE ON MECHATRONICS AND ROBOTICS ENGINEERING, ICMRE, 2023, : 84 - 90
  • [47] Transformer Reasoning Network for Image-Text Matching and Retrieval
    Messina, Nicola
    Falchi, Fabrizio
    Esuli, Andrea
    Amato, Giuseppe
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 5222 - 5229
  • [48] Plug-and-Play Regulators for Image-Text Matching
    Diao, Haiwen
    Zhang, Ying
    Liu, Wei
    Ruan, Xiang
    Lu, Huchuan
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 2322 - 2334
  • [49] News Image-Text Matching With News Knowledge Graph
    Zhao Yumeng
    Yun Jing
    Gao Shuo
    Liu Limin
    IEEE ACCESS, 2021, 9 : 108017 - 108027
  • [50] Synthesizing Counterfactual Samples for Effective Image-Text Matching
    Wei, Hao
    Wang, Shuhui
    Han, Xinzhe
    Xue, Zhe
    Ma, Bin
    Wei, Xiaoming
    Wei, Xiaolin
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4355 - 4364