Bridging multimedia heterogeneity gap via Graph Representation Learning for cross-modal retrieval

被引:23
|
作者
Cheng, Qingrong [1 ]
Gu, Xiaodong [1 ]
机构
[1] Fudan Univ, Dept Elect Engn, Shanghai 200433, Peoples R China
基金
中国国家自然科学基金;
关键词
Cross-modal retrieval; Common space learning; Cross-modal graph; Graph representation learning network; Feature transfer learning network; Graph embedding;
D O I
10.1016/j.neunet.2020.11.011
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Information retrieval among different modalities becomes a significant issue with many promising applications. However, inconsistent feature representation of various multimedia data causes the "heterogeneity gap'' among various modalities, which is a challenge in cross-modal retrieval. For bridging the "heterogeneity gap,'' the popular methods attempt to project the original data into a common representation space, which needs great fitting ability of the model. To address the above issue, we propose a novel Graph Representation Learning (GRL) method for bridging the heterogeneity gap, which does not project the original feature into an aligned representation space but adopts a cross-modal graph to link different modalities. The GRL approach consists of two subnetworks, Feature Transfer Learning Network (FTLN) and Graph Representation Learning Network (GRLN). Firstly, FTLN model finds a latent space for each modality, where the cosine similarity is suitable to describe their similarity. Then, we build a cross-modal graph to reconstruct the original data and their relationships. Finally, we abandon the features in the latent space and turn into embedding the graph vertexes into a common representation space directly. During the process, the proposed Graph Representation Learning method bypasses the most challenging issue by utilizing a cross-modal graph as a bridge to link the "heterogeneity gap'' among different modalities. This attempt utilizes a cross-modal graph as an intermediary agent to bridge the "heterogeneity gap'' in cross-modal retrieval, which is simple but effective. Extensive experiment results on six widely-used datasets indicate that the proposed GRL outperforms other state-of-the-art cross-modal retrieval methods. (C) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页码:143 / 162
页数:20
相关论文
共 50 条
  • [31] Sequential Learning for Cross-modal Retrieval
    Song, Ge
    Tan, Xiaoyang
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 4531 - 4539
  • [32] Combining Link and Content Correlation Learning for Cross-Modal Retrieval in Social Multimedia
    Zhang, Longtao
    Liu, Fangfang
    Zeng, Zhimin
    HUMAN CENTERED COMPUTING, HCC 2017, 2018, 10745 : 516 - 526
  • [33] Deep Semantic Correlation Learning based Hashing for Multimedia Cross-Modal Retrieval
    Gong, Xiaolong
    Huang, Linpeng
    Wang, Fuwei
    2018 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2018, : 117 - 126
  • [34] On the Role of Correlation and Abstraction in Cross-Modal Multimedia Retrieval
    Costa Pereira, Jose
    Coviello, Emanuele
    Doyle, Gabriel
    Rasiwasia, Nikhil
    Lanckriet, Gert R. G.
    Levy, Roger
    Vasconcelos, Nuno
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2014, 36 (03) : 521 - 535
  • [35] Discrete Cross-Modal Hashing for Efficient Multimedia Retrieval
    Ma, Dekui
    Liang, Jian
    Kong, Xiangwei
    He, Ran
    Li, Ying
    PROCEEDINGS OF 2016 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM), 2016, : 38 - 43
  • [36] Robust Unsupervised Cross-modal Hashing for Multimedia Retrieval
    Cheng, Miaomiao
    Jing, Liping
    Ng, Michael K.
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2020, 38 (03)
  • [37] Cross-Modal Learning to Rank via Latent Joint Representation
    Wu, Fei
    Jiang, Xinyang
    Li, Xi
    Tang, Siliang
    Lu, Weiming
    Zhang, Zhongfei
    Zhuang, Yueting
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015, 24 (05) : 1497 - 1509
  • [38] Cross-Modal Discrete Representation Learning
    Liu, Alexander H.
    Jin, SouYoung
    Lai, Cheng-I Jeff
    Rouditchenko, Andrew
    Oliva, Aude
    Glass, James
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 3013 - 3035
  • [39] Adversarial pre-optimized graph representation learning with double-order sampling for cross-modal retrieval
    Cheng, Qingrong
    Guo, Qi
    Gu, Xiaodong
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 231
  • [40] Hierarchical Cross-Modal Graph Consistency Learning for Video-Text Retrieval
    Jin, Weike
    Zhao, Zhou
    Zhang, Pengcheng
    Zhu, Jieming
    He, Xiuqiang
    Zhuang, Yueting
    SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 1114 - 1124