Image-Text Embedding with Hierarchical Knowledge for Cross-Modal Retrieval

被引:2
|
作者
Seo, Sanghyun [1 ]
Kim, Juntae [1 ]
机构
[1] Dongguk Univ, 30,Pildong Ro 1 Gil, Seoul, South Korea
基金
新加坡国家研究基金会;
关键词
Heterogeneous Data Embedding; Image Text Embedding; Hierarchical Knowledge; Cross-modal Retrieval;
D O I
10.1145/3297156.3297244
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Heterogeneous data embedding is a process of mapping different kinds of data into a common vector space of a certain dimension. Image-text embedding also means mapping image and text data that have completely different characteristics into a common vector space. In this paper, we propose an image-text embedding method using hierarchical knowledge such as coarse and fine labels of text data. The proposed method improves the training efficiency of the embedding model by fixing the coarse label vectors. In addition, the loss function is designed by arbitrarily selecting the negative sample from the fine labels having a hierarchical relationship with the coarse label, so that the difference between the vectors of the fine labels which have same coarse label becomes larger. So, when the images that are visual data is mapped into a common vector space, the semantic of images becomes clear. Experimental results show that embedding with hierarchical knowledge has been successfully performed using the proposed methodology and that cross-modal retrieval can be efficiently performed through embedding model.
引用
收藏
页码:350 / 353
页数:4
相关论文
共 50 条
  • [1] Image-Text Cross-Modal Retrieval with Instance Contrastive Embedding
    Zeng, Ruigeng
    Ma, Wentao
    Wu, Xiaoqian
    Liu, Wei
    Liu, Jie
    [J]. ELECTRONICS, 2024, 13 (02)
  • [2] Webly Supervised Joint Embedding for Cross-Modal Image-Text Retrieval
    Mithun, Niluthpol Chowdhury
    Panda, Rameswar
    Papalexakis, Evangelos E.
    Roy-Chowdhury, Amit K.
    [J]. PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 1856 - 1864
  • [3] Learning Hierarchical Semantic Correspondences for Cross-Modal Image-Text Retrieval
    Zeng, Sheng
    Liu, Changhong
    Zhou, Jun
    Chen, Yong
    Jiang, Aiwen
    Li, Hanxi
    [J]. PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2022, 2022, : 239 - 248
  • [4] Hierarchical modal interaction balance cross-modal hashing for unsupervised image-text retrieval
    Zhang, Jie
    Lin, Ziyong
    Jiang, Xiaolong
    Li, Mingyong
    Wang, Chao
    [J]. Multimedia Tools and Applications, 2024, 83 (42) : 90487 - 90509
  • [5] Cross-modal Image-Text Retrieval with Multitask Learning
    Luo, Junyu
    Shen, Ying
    Ao, Xiang
    Zhao, Zhou
    Yang, Min
    [J]. PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19), 2019, : 2309 - 2312
  • [6] Rethinking Benchmarks for Cross-modal Image-text Retrieval
    Chen, Weijing
    Yao, Linli
    Jin, Qin
    [J]. PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 1241 - 1251
  • [7] Cross-Modal Image-Text Retrieval with Semantic Consistency
    Chen, Hui
    Ding, Guiguang
    Lin, Zijin
    Zhao, Sicheng
    Han, Jungong
    [J]. PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1749 - 1757
  • [8] Cross-modal alignment with graph reasoning for image-text retrieval
    Zheng Cui
    Yongli Hu
    Yanfeng Sun
    Junbin Gao
    Baocai Yin
    [J]. Multimedia Tools and Applications, 2022, 81 : 23615 - 23632
  • [9] Image-Text Retrieval With Cross-Modal Semantic Importance Consistency
    Liu, Zejun
    Chen, Fanglin
    Xu, Jun
    Pei, Wenjie
    Lu, Guangming
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (05) : 2465 - 2476
  • [10] Cross-modal alignment with graph reasoning for image-text retrieval
    Cui, Zheng
    Hu, Yongli
    Sun, Yanfeng
    Gao, Junbin
    Yin, Baocai
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (17) : 23615 - 23632