Representation learning using step-based deep multi-modal autoencoders

被引:15
|
作者
Bhatt, Gaurav [1 ]
Jha, Piyush [2 ]
Raman, Balasubramanian [1 ]
机构
[1] IITR, Roorkee 247667, Uttar Pradesh, India
[2] MNIT, Jaipur 302017, Rajasthan, India
关键词
Representation learning; Transfer learning; Convolution autoencoders; Multilingual document classification; CANONICAL CORRELATION-ANALYSIS; CLASSIFICATION; SUBSPACE; NETWORK;
D O I
10.1016/j.patcog.2019.05.032
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep learning techniques have been successfully used in learning a common representation for multi view data, wherein different modalities are projected onto a common subspace. In a broader perspective, the techniques used to investigate common representation learning falls under the categories of 'canonical correlation-based' approaches and 'autoencoder-based' approaches. In this paper, we investigate the performance of deep autoencoder-based methods on multi-view data. We propose a novel step-based correlation multi-modal deep convolution neural network (CorrMCNN) which reconstructs one view of the data given the other while increasing the interaction between the representations at each hidden layer or every intermediate step. The idea of step reconstruction reduces the constraint of reconstruction of original data, instead, the objective function is optimized for reconstruction of representative features. This helps the proposed model to generalize for representation and transfer learning tasks efficiently for high dimensional data. Finally, we evaluate the performance of the proposed model on three multi-view and cross-modal problems viz., audio articulation, cross-modal image retrieval and multilingual (cross-language) document classification. Through extensive experiments, we find that the proposed model performs much better than the current state-of-the-art deep learning techniques on all three multi-view and cross-modal tasks. (C) 2019 Elsevier Ltd. All rights reserved.
引用
收藏
页码:12 / 23
页数:12
相关论文
共 50 条
  • [21] Small Object Detection Technology Using Multi-Modal Data Based on Deep Learning
    Park, Chi-Won
    Seo, Yuri
    Sun, Teh-Jen
    Lee, Ga-Won
    Huh, Eui-Nam
    [J]. 2023 INTERNATIONAL CONFERENCE ON INFORMATION NETWORKING, ICOIN, 2023, : 420 - 422
  • [22] Deep Multi-modal Learning with Cascade Consensus
    Yang, Yang
    Wu, Yi-Feng
    Zhan, De-Chuan
    Jiang, Yuan
    [J]. PRICAI 2018: TRENDS IN ARTIFICIAL INTELLIGENCE, PT II, 2018, 11013 : 64 - 72
  • [23] Multi-modal deep distance metric learning
    Roostaiyan, Seyed Mahdi
    Imani, Ehsan
    Baghshah, Mahdieh Soleymani
    [J]. INTELLIGENT DATA ANALYSIS, 2017, 21 (06) : 1351 - 1369
  • [24] Multi-modal Alignment using Representation Codebook
    Duan, Jiali
    Chen, Liqun
    Tran, Son
    Yang, Jinyu
    Xu, Yi
    Zeng, Belinda
    Chilimbi, Trishul
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 15630 - 15639
  • [25] Variational Mixture-of-Experts Autoencoders for Multi-Modal Deep Generative Models
    Shi, Yuge
    Siddharth, N.
    Paige, Brooks
    Torr, Philip H. S.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [26] Multi-modal Representation Learning for Successive POI Recommendation
    Li, Lishan
    Liu, Ying
    Wu, Jianping
    He, Lin
    Ren, Gang
    [J]. ASIAN CONFERENCE ON MACHINE LEARNING, VOL 101, 2019, 101 : 441 - 456
  • [27] Fast Multi-Modal Unified Sparse Representation Learning
    Verma, Mridula
    Shukla, Kaushal Kumar
    [J]. PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR'17), 2017, : 448 - 452
  • [28] Joint Representation Learning for Multi-Modal Transportation Recommendation
    Liu, Hao
    Li, Ting
    Hu, Renjun
    Fu, Yanjie
    Gu, Jingjing
    Xiong, Hui
    [J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 1036 - 1043
  • [29] Contrastive Multi-Modal Knowledge Graph Representation Learning
    Fang, Quan
    Zhang, Xiaowei
    Hu, Jun
    Wu, Xian
    Xu, Changsheng
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (09) : 8983 - 8996
  • [30] Supervised Multi-modal Dictionary Learning for Clothing Representation
    Zhao, Qilu
    Wang, Jiayan
    Li, Zongmin
    [J]. PROCEEDINGS OF THE FIFTEENTH IAPR INTERNATIONAL CONFERENCE ON MACHINE VISION APPLICATIONS - MVA2017, 2017, : 51 - 54