A deep semantic framework for multimodal representation learning

被引:0
|
作者
Cheng Wang
Haojin Yang
Christoph Meinel
机构
[1] University of Potsdam,Hasso Plattner Institute
来源
关键词
Multimodal representation; Deep neural networks; Semantic feature; Cross-modal retrieval;
D O I
暂无
中图分类号
学科分类号
摘要
Multimodal representation learning has gained increasing importance in various real-world multimedia applications. Most previous approaches focused on exploring inter-modal correlation by learning a common or intermediate space in a conventional way, e.g. Canonical Correlation Analysis (CCA). These works neglected the exploration of fusing multiple modalities at higher semantic level. In this paper, inspired by the success of deep networks in multimedia computing, we propose a novel unified deep neural framework for multimodal representation learning. To capture the high-level semantic correlations across modalities, we adopted deep learning feature as image representation and topic feature as text representation respectively. In joint model learning, a 5-layer neural network is designed and enforced with a supervised pre-training in the first 3 layers for intra-modal regularization. The extensive experiments on benchmark Wikipedia and MIR Flickr 25K datasets show that our approach achieves state-of-the-art results compare to both shallow and deep models in multimodal and cross-modal retrieval.
引用
收藏
页码:9255 / 9276
页数:21
相关论文
共 50 条
  • [1] A deep semantic framework for multimodal representation learning
    Wang, Cheng
    Yang, Haojin
    Meinel, Christoph
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2016, 75 (15) : 9255 - 9276
  • [2] A Deep Multimodal Representation Learning Framework for Accurate Molecular Properties Prediction
    Yang, Yuxin
    Wang, Zixu
    Ahadian, Pegah
    Jerger, Abby
    Zucker, Jeremy
    Feng, Song
    Cheng, Feixiong
    Guan, Qiang
    [J]. PROCEEDING OF THE GREAT LAKES SYMPOSIUM ON VLSI 2024, GLSVLSI 2024, 2024, : 760 - 765
  • [3] Deep Multimodal Representation Learning: A Survey
    Guo, Wenzhong
    Wang, Jianwen
    Wang, Shiping
    [J]. IEEE ACCESS, 2019, 7 : 63373 - 63394
  • [4] Survey of Research on Deep Multimodal Representation Learning
    Pan, Mengzhu
    Li, Qianmu
    Qiu, Tian
    [J]. Computer Engineering and Applications, 2024, 59 (02) : 48 - 64
  • [5] Multimodal deep representation learning for video classification
    Tian, Haiman
    Tao, Yudong
    Pouyanfar, Samira
    Chen, Shu-Ching
    Shyu, Mei-Ling
    [J]. WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2019, 22 (03): : 1325 - 1341
  • [6] Multimodal deep representation learning for video classification
    Haiman Tian
    Yudong Tao
    Samira Pouyanfar
    Shu-Ching Chen
    Mei-Ling Shyu
    [J]. World Wide Web, 2019, 22 : 1325 - 1341
  • [7] Multimodal Deep Learning in Semantic Image Segmentation: A Review
    Raman, Vishal
    Kumari, Madhu
    [J]. PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTERNET OF THINGS (CCIOT 2018), 2018, : 7 - 11
  • [8] Deep Multimodal Representation Learning from Temporal Data
    Yang, Xitong
    Ramesh, Palghat
    Chitta, Radha
    Madhvanath, Sriganesh
    Bernal, Edgar A.
    Luo, Jiebo
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 5066 - 5074
  • [9] Deep learning with multimodal representation for pancancer prognosis prediction
    Cheerla, Anika
    Gevaert, Olivier
    [J]. BIOINFORMATICS, 2019, 35 (14) : I446 - I454
  • [10] Semantic Representation Based on Deep Learning for Spam Detection
    Saidani, Nadjate
    Adi, Kamel
    Allili, Mohand Said
    [J]. FOUNDATIONS AND PRACTICE OF SECURITY, FPS 2019, 2020, 12056 : 72 - 81