Bridging multimedia heterogeneity gap via Graph Representation Learning for cross-modal retrieval

被引:23
|
作者
Cheng, Qingrong [1 ]
Gu, Xiaodong [1 ]
机构
[1] Fudan Univ, Dept Elect Engn, Shanghai 200433, Peoples R China
基金
中国国家自然科学基金;
关键词
Cross-modal retrieval; Common space learning; Cross-modal graph; Graph representation learning network; Feature transfer learning network; Graph embedding;
D O I
10.1016/j.neunet.2020.11.011
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Information retrieval among different modalities becomes a significant issue with many promising applications. However, inconsistent feature representation of various multimedia data causes the "heterogeneity gap'' among various modalities, which is a challenge in cross-modal retrieval. For bridging the "heterogeneity gap,'' the popular methods attempt to project the original data into a common representation space, which needs great fitting ability of the model. To address the above issue, we propose a novel Graph Representation Learning (GRL) method for bridging the heterogeneity gap, which does not project the original feature into an aligned representation space but adopts a cross-modal graph to link different modalities. The GRL approach consists of two subnetworks, Feature Transfer Learning Network (FTLN) and Graph Representation Learning Network (GRLN). Firstly, FTLN model finds a latent space for each modality, where the cosine similarity is suitable to describe their similarity. Then, we build a cross-modal graph to reconstruct the original data and their relationships. Finally, we abandon the features in the latent space and turn into embedding the graph vertexes into a common representation space directly. During the process, the proposed Graph Representation Learning method bypasses the most challenging issue by utilizing a cross-modal graph as a bridge to link the "heterogeneity gap'' among different modalities. This attempt utilizes a cross-modal graph as an intermediary agent to bridge the "heterogeneity gap'' in cross-modal retrieval, which is simple but effective. Extensive experiment results on six widely-used datasets indicate that the proposed GRL outperforms other state-of-the-art cross-modal retrieval methods. (C) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页码:143 / 162
页数:20
相关论文
共 50 条
  • [1] OTCMR: Bridging Heterogeneity Gap with Optimal Transport for Cross-modal Retrieval
    Li, Mingyang
    Huang, Shao-Lun
    Zhang, Lin
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 3216 - 3220
  • [2] Learning Consistent Feature Representation for Cross-Modal Multimedia Retrieval
    Kang, Cuicui
    Xiang, Shiming
    Liao, Shengcai
    Xu, Changsheng
    Pan, Chunhong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2015, 17 (03) : 370 - 381
  • [3] Cross-Modal Retrieval via Deep and Bidirectional Representation Learning
    He, Yonghao
    Xiang, Shiming
    Kang, Cuicui
    Wang, Jian
    Pan, Chunhong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2016, 18 (07) : 1363 - 1377
  • [4] Deep Learning and Shared Representation Space Learning Based Cross-Modal Multimedia Retrieval
    Zou, Hui
    Du, Ji-Xiang
    Zhai, Chuan-Min
    Wang, Jing
    INTELLIGENT COMPUTING THEORIES AND APPLICATION, ICIC 2016, PT II, 2016, 9772 : 322 - 331
  • [5] Multimodal Graph Learning for Cross-Modal Retrieval
    Xie, Jingyou
    Zhao, Zishuo
    Lin, Zhenzhou
    Shen, Ying
    PROCEEDINGS OF THE 2023 SIAM INTERNATIONAL CONFERENCE ON DATA MINING, SDM, 2023, : 145 - 153
  • [6] Hybrid representation learning for cross-modal retrieval
    Cao, Wenming
    Lin, Qiubin
    He, Zhihai
    He, Zhiquan
    NEUROCOMPUTING, 2019, 345 : 45 - 57
  • [7] Heterogeneous Metric Learning for Cross-Modal Multimedia Retrieval
    Deng, Jun
    Du, Liang
    Shen, Yi-Dong
    WEB INFORMATION SYSTEMS ENGINEERING - WISE 2013, PT I, 2013, 8180 : 43 - 56
  • [8] Wasserstein Coupled Graph Learning for Cross-Modal Retrieval
    Wang, Yun
    Zhang, Tong
    Zhang, Xueya
    Cui, Zhen
    Huang, Yuge
    Shen, Pengcheng
    Li, Shaoxin
    Yang, Jian
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1793 - 1802
  • [9] Graph Embedding Learning for Cross-Modal Information Retrieval
    Zhang, Youcai
    Gu, Xiaodong
    NEURAL INFORMATION PROCESSING (ICONIP 2017), PT III, 2017, 10636 : 594 - 601
  • [10] Combination subspace graph learning for cross-modal retrieval
    Xu, Gongwen
    Li, Xiaomei
    Shi, Lin
    Zhang, Zhijun
    Zhai, Aidong
    ALEXANDRIA ENGINEERING JOURNAL, 2020, 59 (03) : 1333 - 1343