Cross-Modal Retrieval Based on Full-Modal Autoencoder with Generative Adversarial Mechanism

被引:0
|
作者
Zhao P. [1 ,2 ]
Ma T. [1 ,2 ]
Li Y. [1 ,2 ]
Liu H. [1 ,2 ]
机构
[1] Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Anhui University, Hefei
[2] School of Computer Science and Technology, Anhui University, Hefei
关键词
Cross-modal retrieval; Full-modal autoencoder; Generative adversarial networks;
D O I
10.3724/SP.J.1089.2021.18757
中图分类号
学科分类号
摘要
Existing cross-modal retrieval methods based on generative adversarial networks can't fully explore the in-ter-modality invariance. Aiming to solve the problem, a novel cross-modal retrieval method based on full-modal autoencoder with generative adversarial mechanism is proposed. Two parallel full- modal autoencoders are intro-duced to embed samples of different modalities into a common space. Each full-modal autoencoder not only re-constructs the feature representation of its own modality, but also reconstructs the feature representation of the other modality. A classifier is designed to predict the categories of the embedding features in the common space, which aims to preserve the semantic discriminative information of samples. Three discriminators are designed to determine the modal categories of the input features, respectively, and these three discriminators work coopera-tively to fully explore the inter-modality invariance. The mean average precision (mAP) is used to evaluate the accuracy of cross-modal retrieval and extensive experiments are conducted on three public datasets which are Pascal Sentence, Wikipedia and NUS-WIDE-10k. Compared to ten state-of-the-art cross-modal retrieval methods including traditional methods and deep learning methods, the mAP of the proposed method on the three datasets improves at least 4.8%, 1.4% and 1.1% on the three datasets respectively. The experimental results prove the effectiveness of the proposed method. © 2021, Beijing China Science Journal Publishing Co. Ltd. All right reserved.
引用
收藏
页码:1486 / 1494
页数:8
相关论文
共 25 条
  • [1] Rasiwasia N, Costa Pereira J, Coviello E, Et al., A new approach to cross-modal multimedia retrieval, Proceedings of the 18th ACM International Conference on Multimedia, pp. 251-260, (2010)
  • [2] Li D G, Dimitrova N, Li M K, Et al., Multimedia content processing through cross-modal association, Proceedings of the 11th ACM International Conference on Multimedia, pp. 604-611, (2003)
  • [3] Costa Pereira J, Coviello E, Doyle G, Et al., On the role of correlation and abstraction in cross-modal multimedia retrieval, IEEE Transactions on Pattern Analysis and Machine Intelligence, 36, 3, pp. 521-535, (2014)
  • [4] Wang K Y, He R, Wang W, Et al., Learning coupled feature spaces for cross-modal matching, Proceedings of the IEEE International Conference on Computer Vision, pp. 2088-2095, (2013)
  • [5] Ranjan V, Rasiwasia N, Jawahar C V., Multi-label cross-modal retrieval, Proceedings of the IEEE International Conference on Computer Vision, pp. 4094-4102, (2015)
  • [6] Sharma A, Kumar A, Jacobs D W, Et al., Generalized multiview analysis: a discriminative latent space, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2160-2167, (2012)
  • [7] Peng Y X, Huang X, Qi J W., Cross-media shared representation by hierarchical learning with multiple deep networks, Proceedings of the 25th International Joint Conference on Artificial Intelligence, pp. 3846-3853, (2016)
  • [8] Huang X, Peng Y X, Yuan M K., Cross-modal common representation learning by hybrid transfer network, Proceedings of the 26th International Joint Conference on Artificial Intelligence, pp. 1893-1900, (2017)
  • [9] Peng Y X, Qi J W., Show and tell in the loop: cross-modal circular correlation learning, IEEE Transactions on Multimedia, 21, 6, pp. 1538-1550, (2019)
  • [10] Peng Y X, Qi J W, Huang X, Et al., CCL: cross-modal correlation learning with multigrained fusion by hierarchical network, IEEE Transactions on Multimedia, 20, 2, pp. 405-420, (2018)