Cross-media Retrieval by Learning Rich Semantic Embeddings of Multimedia

被引:12
|
作者
Fan, Mengdi [1 ]
Wang, Wenmin [1 ]
Dong, Peilei [1 ]
Han, Liang [1 ]
Wang, Ronggang [1 ]
Li, Ge [1 ]
机构
[1] Peking Univ, Beijing, Peoples R China
关键词
Cross-media retrieval; rich semantic embeddings; multi-sensory fusion; TextNet;
D O I
10.1145/3123266.3123369
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Cross-media retrieval aims at seeking the semantic association between different media types. Most existing methods paid much attention on learning mapping functions or finding the optimal spaces, but neglected how people accurately cognize images and texts. This paper proposes a brain inspired cross-media retrieval framework to learn rich semantic embeddings of multimedia. Different from directly using off-the-shelf image features, we combine the visual and descriptive senses for an image from the view of human perception via a joint model, called multi-sensory fusion network (MSFN). A topic model based TextNet maps texts into the same semantic space as images according to their shared ground truth labels. Moreover, in order to overcome the limitations of insufficient data for training neural networks and less complexity in text form, we introduce a large-scale image-text dataset, called Britannica dataset. Extensive experiments show the effectiveness of our framework for different lengths of texts on three benchmark datasets as well as Britannica dataset. Most of all, we report the best known average results of Img2Text and Text2Img compared with several state-of-the-art methods.
引用
收藏
页码:1698 / 1706
页数:9
相关论文
共 50 条
  • [1] A Benchmark Dataset and Learning High-Level Semantic Embeddings of Multimedia for Cross-media Retrieval
    Rehman, Sadaqat Ur
    Tu, Shanshan
    Huang, Yongfeng
    Rehman, Obaid Ur
    [J]. IEEE ACCESS, 2018, 6 : 67176 - 67188
  • [2] Learning semantic correlations for cross-media retrieval
    Wu, Fei
    Zhang, Hong
    Zhuang, Yueting
    [J]. 2006 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP 2006, PROCEEDINGS, 2006, : 1465 - +
  • [3] Cross-media retrieval with collective deep semantic learning
    Bin Zhang
    Lei Zhu
    Jiande Sun
    Huaxiang Zhang
    [J]. Multimedia Tools and Applications, 2018, 77 : 22247 - 22266
  • [4] Cross-media retrieval with collective deep semantic learning
    Zhang, Bin
    Zhu, Lei
    Sun, Jiande
    Zhang, Huaxiang
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (17) : 22247 - 22266
  • [5] Mining semantic correlation of heterogeneous multimedia data for cross-media retrieval
    Zhuang, Yue-Ting
    Yang, Yi
    Wu, Fei
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2008, 10 (02) : 221 - 229
  • [6] Cross-media Relevance Computation for Multimedia Retrieval
    Dong, Jianfeng
    [J]. PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 831 - 835
  • [7] Semi-Supervised Learning Based Semantic Cross-Media Retrieval
    Zheng, Xiyuan
    Zhu, Wei
    Yu, Zhenmei
    Zhang, Meijia
    [J]. IEEE ACCESS, 2021, 9 : 75049 - 75057
  • [8] Discovering Semantic Vocabularies for Cross-Media Retrieval
    Habibian, Amirhossein
    Mensink, Thomas
    Snoek, Cees G. M.
    [J]. ICMR'15: PROCEEDINGS OF THE 2015 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2015, : 131 - 138
  • [9] Understanding multimedia document semantics for cross-media retrieval
    Wu, F
    Yang, Y
    Zhuang, YT
    Pan, YH
    [J]. ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2005, PT 1, 2005, 3767 : 993 - 1004
  • [10] ENHANCED ISOMORPHIC SEMANTIC REPRESENTATION FOR CROSS-MEDIA RETRIEVAL
    Liu, Ting
    Zhao, Yao
    Wei, Shikui
    Wei, Yunchao
    Liao, Lixin
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 967 - 972