Cross-modal Retrieval with Correspondence Autoencoder

被引:415
|
作者
Feng, Fangxiang [1 ]
Wang, Xiaojie [1 ]
Li, Ruifan [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Beijing, Peoples R China
基金
国家高技术研究发展计划(863计划); 中国国家自然科学基金;
关键词
Cross-modal; retrieval; image and text; deep learning; autoencoder;
D O I
10.1145/2647868.2654902
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The problem of cross-modal retrieval, e.g., using a text query to search for images and vice-versa, is considered in this paper. A novel model involving correspondence autoencoder (Corr-AE) is proposed here for solving this problem. The model is constructed by correlating hidden representations of two uni-modal autoencoders. A novel optimal objective, which minimizes a linear combination of representation learning errors for each modality and correlation learning error between hidden representations of two modalities, is used to train the model as a whole. Minimization of correlation learning error forces the model to learn hidden representations with only common information in different modalities, while minimization of representation learning error makes hidden representations are good enough to reconstruct input of each modality. A parameter alpha is used to balance the representation learning error and the correlation learning error. Based on two different multi-modal autoencoders, Corr-AE is extended to other two correspondence models, here we called Corr-Cross-AE and Corr-Full-AE. The proposed models are evaluated on three publicly available data sets from real scenes. We demonstrate that the three correspondence autoencoders perform significantly better than three canonical correlation analysis based models and two popular multi-modal deep models on cross-modal retrieval tasks.
引用
收藏
页码:7 / 16
页数:10
相关论文
共 50 条
  • [1] Correspondence Autoencoders for Cross-Modal Retrieval
    Feng, Fangxiang
    Wang, Xiaojie
    Li, Ruifan
    Ahmad, Ibrar
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2015, 12 (01)
  • [2] Multi-modal semantic autoencoder for cross-modal retrieval
    Wu, Yiling
    Wang, Shuhui
    Huang, Qingming
    [J]. NEUROCOMPUTING, 2019, 331 : 165 - 175
  • [3] Deep supervised multimodal semantic autoencoder for cross-modal retrieval
    Tian, Yu
    Yang, Wenjing
    Liu, Qingsong
    Yang, Qiong
    [J]. COMPUTER ANIMATION AND VIRTUAL WORLDS, 2020, 31 (4-5)
  • [4] ONION: Online Semantic Autoencoder Hashing for Cross-Modal Retrieval
    Zhang, Donglin
    Wu, Xiao-Jun
    Chen, Guoqing
    [J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2023, 14 (02)
  • [5] Deep correspondence restricted Boltzmann machine for cross-modal retrieval
    Feng, Fangxiang
    Li, Ruifan
    Wang, Xiaojie
    [J]. NEUROCOMPUTING, 2015, 154 : 50 - 60
  • [6] Deep Evidential Learning with Noisy Correspondence for Cross-modal Retrieval
    Qin, Yang
    Peng, Dezhong
    Peng, Xi
    Wang, Xu
    Hu, Peng
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4948 - 4956
  • [7] Cross-Modal Retrieval Based on Full-Modal Autoencoder with Generative Adversarial Mechanism
    Zhao, Peng
    Ma, Taiyu
    Li, Yi
    Liu, Huiting
    [J]. Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2021, 33 (10): : 1486 - 1494
  • [8] Variational Autoencoder with CCA for Audio-Visual Cross-modal Retrieval
    Zhang, Jiwei
    Yu, Yi
    Tang, Suhua
    Wu, Jianming
    Li, Wei
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (03)
  • [9] Autoencoder-based self-supervised hashing for cross-modal retrieval
    Li, Yifan
    Wang, Xuan
    Cui, Lei
    Zhang, Jiajia
    Huang, Chengkai
    Luo, Xuan
    Qi, Shuhan
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (11) : 17257 - 17274
  • [10] Cross-Modal Retrieval With Noisy Correspondence via Consistency Refining and Mining
    Ma, Xinran
    Yang, Mouxing
    Li, Yunfan
    Hu, Peng
    Lv, Jiancheng
    Peng, Xi
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 2587 - 2598