Learning hierarchical embedding space for image-text matching

被引:0
|
作者
Sun, Hao [1 ]
Qin, Xiaolin [1 ]
Liu, Xiaojing [1 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
Information retrieval; cross-modal representation; hierarchical embedding; local alignment;
D O I
10.3233/IDA-230214
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
There are two mainstream strategies for image-text matching at present. The one, termed as joint embedding learning, aims to model the semantic information of both image and sentence in a shared feature subspace, which facilitates the measurement of semantic similarity but only focuses on global alignment relationship. To explore the local semantic relationship more fully, the other one, termed as metric learning, aims to learn a complex similarity function to directly output score of each image-text pair. However, it significantly suffers from more computation burden at retrieval stage. In this paper, we propose a hierarchically joint embedding model to incorporate the local semantic relationship into a joint embedding learning framework. The proposed method learns the shared local and global embedding spaces simultaneously, and models the joint local embedding space with respect to specific local similarity labels which are easy to access from the lexical information of corpus. Unlike the methods based on metric learning, we can prepare the fixed representations of both images and sentences by concatenating the normalized local and global representations, which makes it feasible to perform the efficient retrieval. And experiments show that the proposed model can achieve competitive performance when compared to the existing joint embedding learning models on two publicly available datasets Flickr30k and MS-COCO.
引用
下载
收藏
页码:647 / 665
页数:19
相关论文
共 50 条
  • [21] Bi-Attention enhanced representation learning for image-text matching
    Tian, Yumin
    Ding, Aqiang
    Wang, Di
    Luo, Xuemei
    Wan, Bo
    Wang, Yifeng
    PATTERN RECOGNITION, 2023, 140
  • [22] Learning Image-Text Associations
    Jiang, Tao
    Tan, Ah-Hwee
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (02) : 161 - 177
  • [23] Expressing Objects Just Like Words: Recurrent Visual Embedding for Image-Text Matching
    Chen, Tianlang
    Luo, Jiebo
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 10583 - 10590
  • [24] Deep Cross-Modal Projection Learning for Image-Text Matching
    Zhang, Ying
    Lu, Huchuan
    COMPUTER VISION - ECCV 2018, PT I, 2018, 11205 : 707 - 723
  • [25] ITContrast: contrastive learning with hard negative synthesis for image-text matching
    Wu, Fangyu
    Wang, Qiufeng
    Wang, Zhao
    Yu, Siyue
    Li, Yushi
    Zhang, Bailing
    Lim, Eng Gee
    VISUAL COMPUTER, 2024, 40 (12): : 8825 - 8838
  • [26] Learning Fragment Self-Attention Embeddings for Image-Text Matching
    Wu, Yiling
    Wang, Shuhui
    Song, Guoli
    Huang, Qingming
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 2088 - 2096
  • [27] Learning Dual Semantic Relations With Graph Attention for Image-Text Matching
    Wen, Keyu
    Gu, Xiaodong
    Cheng, Qingrong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (07) : 2866 - 2879
  • [28] Self-attention guided representation learning for image-text matching
    Qi, Xuefei
    Zhang, Ying
    Qi, Jinqing
    Lu, Huchuan
    NEUROCOMPUTING, 2021, 450 : 143 - 155
  • [29] HAAN: Learning a Hierarchical Adaptive Alignment Network for Image-Text Retrieval
    Wang, Shuhuai
    Liu, Zheng
    Pei, Xinlei
    Xu, Junhao
    SENSORS, 2023, 23 (05)
  • [30] Similarity Reasoning and Filtration for Image-Text Matching
    Diao, Haiwen
    Zhang, Ying
    Ma, Lin
    Lu, Huchuan
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 1218 - 1226