Bridging multimedia heterogeneity gap via Graph Representation Learning for cross-modal retrieval

被引:23
|
作者
Cheng, Qingrong [1 ]
Gu, Xiaodong [1 ]
机构
[1] Fudan Univ, Dept Elect Engn, Shanghai 200433, Peoples R China
基金
中国国家自然科学基金;
关键词
Cross-modal retrieval; Common space learning; Cross-modal graph; Graph representation learning network; Feature transfer learning network; Graph embedding;
D O I
10.1016/j.neunet.2020.11.011
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Information retrieval among different modalities becomes a significant issue with many promising applications. However, inconsistent feature representation of various multimedia data causes the "heterogeneity gap'' among various modalities, which is a challenge in cross-modal retrieval. For bridging the "heterogeneity gap,'' the popular methods attempt to project the original data into a common representation space, which needs great fitting ability of the model. To address the above issue, we propose a novel Graph Representation Learning (GRL) method for bridging the heterogeneity gap, which does not project the original feature into an aligned representation space but adopts a cross-modal graph to link different modalities. The GRL approach consists of two subnetworks, Feature Transfer Learning Network (FTLN) and Graph Representation Learning Network (GRLN). Firstly, FTLN model finds a latent space for each modality, where the cosine similarity is suitable to describe their similarity. Then, we build a cross-modal graph to reconstruct the original data and their relationships. Finally, we abandon the features in the latent space and turn into embedding the graph vertexes into a common representation space directly. During the process, the proposed Graph Representation Learning method bypasses the most challenging issue by utilizing a cross-modal graph as a bridge to link the "heterogeneity gap'' among different modalities. This attempt utilizes a cross-modal graph as an intermediary agent to bridge the "heterogeneity gap'' in cross-modal retrieval, which is simple but effective. Extensive experiment results on six widely-used datasets indicate that the proposed GRL outperforms other state-of-the-art cross-modal retrieval methods. (C) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页码:143 / 162
页数:20
相关论文
共 50 条
  • [41] Unsupervised multi-graph cross-modal hashing for large-scale multimedia retrieval
    Xie, Liang
    Zhu, Lei
    Chen, Guoqi
    MULTIMEDIA TOOLS AND APPLICATIONS, 2016, 75 (15) : 9185 - 9204
  • [42] Unsupervised multi-graph cross-modal hashing for large-scale multimedia retrieval
    Liang Xie
    Lei Zhu
    Guoqi Chen
    Multimedia Tools and Applications, 2016, 75 : 9185 - 9204
  • [43] Learning Disentangled Representation for Cross-Modal Retrieval with Deep Mutual Information Estimation
    Guo, Weikuo
    Huang, Huaibo
    Kong, Xiangwei
    He, Ran
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1712 - 1720
  • [44] Graph Convolutional Network Hashing for Cross-Modal Retrieval
    Xu, Ruiqing
    Li, Chao
    Yan, Junchi
    Deng, Cheng
    Liu, Xianglong
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 982 - 988
  • [45] Hybrid DAER Based Cross-Modal Retrieval Exploiting Deep Representation Learning
    Huang, Zhao
    Hu, Haowu
    Su, Miao
    ENTROPY, 2023, 25 (08)
  • [46] Adversarial Learning-Based Semantic Correlation Representation for Cross-Modal Retrieval
    Zhu, Lei
    Song, Jiayu
    Zhu, Xiaofeng
    Zhang, Chengyuan
    Zhang, Shichao
    Yuan, Xinpan
    IEEE MULTIMEDIA, 2020, 27 (04) : 79 - 90
  • [47] Probability Distribution Representation Learning for Image-Text Cross-Modal Retrieval
    Yang C.
    Liu L.
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2022, 34 (05): : 751 - 759
  • [48] Collaborative Subspace Graph Hashing for Cross-modal Retrieval
    Zhang, Xiang
    Dong, Guohua
    Du, Yimo
    Wu, Chengkun
    Luo, Zhigang
    Yang, Canqun
    ICMR '18: PROCEEDINGS OF THE 2018 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2018, : 213 - 221
  • [49] Adversarial Graph Convolutional Network for Cross-Modal Retrieval
    Dong, Xinfeng
    Liu, Li
    Zhu, Lei
    Nie, Liqiang
    Zhang, Huaxiang
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (03) : 1634 - 1645
  • [50] Real-world Cross-modal Retrieval via Sequential Learning
    Song, Ge
    Tan, Xiaoyang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 1708 - 1721